


■ 






FAMOUS PROBLEMS 

AND OTHER MONOGRAPHS 


FAMOUS PROBLEMS OF 
ELEMENTARY GEOMETRY 

BY F. KLEIN 


FROM DETERMINANT 
TO TENSOR 

BY W. F. SHEPPARD 


INTRODUCTION TO 
COMBINATORY ANALYSIS 

BY P. A. MACMAHON 


THREE LECTURES ON 
FERMAT’S LAST THEOREM 

BY L. J. MORDELL 


AMS CHELSEA PUBLISHING 

American Mathematical Society • Providence, Rhode Island 


FAMOUS PROBLEMS OF ELEMENTARY GEOMETRY 
was originally published by Ginn and Company 

FROM DETERMINANT TO TENSOR 
was originally published by Oxford University Press 

INTRODUCTION TO COMBINATORY ANALYSIS 
was originally published by Cambridge University Press 

THREE LECTURES ON FERMAT’S LAST THEOREM 
was originally published by Cambridge University Press 

2000 Mathematics Subject Classification. Primary 00B10. 


Library of Congress Catalog Card Number 62-18132 
International Standard Book Number 0-8218-2674-3 


FIRST EDITION 1955 
SECOND EDITION 1962, 1980 


Copyright © 1962 by Chelsea Publishing Company 
Printed in the United States of America. 

Reprinted by the American Mathematical Society, 2000 
The American Mathematical Society retains all rights 
except those granted to the United States Government. 

@ The paper used in this book is acid-free and falls within the guidelines 
established to ensure permanence and durability. 

Visit the AMS home page at URL: http://vnru.ams.org/ 


10 987654321 


04 03 02 01 00 


EDITOR’S PREFACE 

This work, like its companion volume, Squaring the Circle, 
and other Monographs by Hobson et ah, consists of a reprint 
in one volume of several books on mathematics that were 
originally published as separate volumes. 

The reason for the selection of the four books that comprise 
this volume is that each is a valuable and important work and 
that each is of interest to a fairly wide circle of mathematicians 
and students. 

The reason for their inclusion in a single volume is neither 
learned nor recondite. The reason is purely economic : Re- 
printed separately, the books would have to be priced at not 
much less than the price of the whole present volume (if they 
could be so reprinted at all). Anyone who buys the book for 
the sake of one of the four volumes that it contains will surely 
find the other three of interest and will consider them to be a 
worthwhile and welcome addition to his library. 


■ 




FAMOUS PROBLEMS 

OF 

ELEMENTARY GEOMETRY 


THE DUPLICATION OE THE CUBE 
THE TRISECTION OF AN ANGLE 
THE QUADRATURE OF THE CIRCLE 


AN AUTHORIZED TKANSLATION OF F. KLEIN’S 
VORTRAGE UBER AUSGEWAHLTE FRAGEN DER ELEMENT ARGEOMETRIE 
AUSGEARBEITET YON F. TAGERT 

BY 

WOOSTER WOODRUFF BEMAN 
1850—1922 

AND 

DAVID EUGENE SMITH 

EMERITUS PROFESSOR OF MATHEMATICS IN COLUMBIA UNIVERSITY 


SECOND EDITION REVISED, AND ENLARGED WITH NOTES 
BY 

RAYMOND CLARE ARCHIBALD 

PROFESSOR OF MATHEMATICS IN BROWN UNIVERSITY 





PREFACE. 


The more precise definitions and more rigorous methods of 
demonstration developed by modern mathematics are looked 
upon by the mass of gymnasium professors as abstruse and 
excessively abstract, and accordingly as of importance only 
for the small circle of specialists. With a view to counteract- 
ing this tendency it gave me pleasure to set forth last summer 
in a brief course of lectures before a larger audience than 
usual what modern science has to say regarding the possibility 
of elementary geometric constructions. Some time before, I 
had had occasion to present a sketch of these lectures in an 
Easter vacation course at Gottingen. The audience seemed 
to take great interest in them, and this impression has been 
confirmed by the experience of the summer semester. I ven- 
ture therefore to present a short exposition of my lectures to 
the Association for the Advancement of the Teaching of Math- 
ematics and the Natural Sciences, for the meeting to be held at 
Gottingen. This exposition has been prepared by Oberlehrer 
Tagert, of Ems, who attended the vacation course just men- 
tioned. He also had at his disposal the lecture notes written 
out under my supervision by several of my summer semester 
students. I hope that this unpretending little book may con- 
tribute to promote the useful work of the association. 


Gottingen, Easter, 1895. 


F. KLEIN. 






TRANSLATORS’ PREFACE. 


At the Gottingen meeting of the German Association for 
the Advancement of the Teaching of Mathematics and the 
Natural Sciences, Professor Felix Klein presented a discus- 
sion of the three famous geometric problems of antiquity, 
— the duplication of the cube, the trisection of an angle, 
and the quadrature of the circle, as viewed in the light of 
modern research. 

This was done with the avowed purpose of bringing the 
study of mathematics in the university into closer touch with 
the work of the gymnasium. That Professor Klein is likely 
to succeed in this effort is shown by the favorable reception 
accorded his lectures by the association, the uniform commen- 
dation of the educational journals, and the fact that transla- 
tions into French and Italian have already appeared. 

The treatment of the. subject is elementary, not even a 
knowledge of the differential and integral calculus being 
required. Among the questions answered are such as these : 
Under what circumstances is a geometric construction pos- 
sible ? By what means can it be effected ? What are tran- 
scendental numbers? How can we prove that e and w are 
transcendental ? 

With the belief that an English presentation of so impor- 
tant a work would appeal to many unable to read the original, 


X 


TRANSLATOR'S PREFACE. 


Professor Klein’s consent to a translation was sought and 
readily secured. 

In its preparation the authors have also made free use of 
the French translation by Professor J. Griess, of Algiers, 
following its modifications where it seemed advisable. 

They desire further to thank Professor Ziwet for assist- 
ance in improving the translation and in reading the proof- 
sheets. 


August, 1897. 


W. W. BEMAN. 
D. E. SMITH. 


EDITOR’S PREFACE. 


Within three years of its publication thirty -five years 
ago Klein’s little work was translated into English, French, 
Italian, and Russian 1 . In the United States it filled a decided 
need for many years, and not a few teachers regretted 
that the work was allowed to go out of print. No other 
work supplied in such compact form just the information 
here found. Hence it seemed desirable to have a new 
edition with at least some of the slips of the first edition 
rectified, and with added notes illuminating the text. 

The corrections and notes of the present edition are 
little more than revised extracts from my article in The 
American Mathematical Monthly 2 , 1914. I am indebted to 
the Editors for courteously allowing the reproduction of 
this material. 

R. C. A. 

February , 1930. 


1 French translation by Griefis, Paris, Nony, 1896; Italian by Giudice, 
Turin, Rosenberg e Sallier, 1896; Russian by Parfentiev and Sintsov, 
Kazan, 1898. This last translation -ffeems to have been unknown to the 
editors of Klein’s Abhandlungen (see v. 3, 1923, p. 28). 

2 Remarks on Klein’s “Famous Problems of Elementary Geometry" 
v. 21, p. 247—259. 



s 










CONTENTS. 


INTRODUCTION. 

PAGK 

Practical and Theoretical Constructions .... 2 

Statement of the Problem in Aloebraic Form ... 3 

PART I. 

The Possibility of the Construction of Algebraic Expressions. 
Chapter I. Algebraic Equations Solvable by Square Roots. 

1- 4. Structure of the expression x to be constructed ... 5 

5, 0. Normal form of x ....... 6 

7, 8. Conjugate values 7 

9. The corresponding equation F(x) = o . . . . 8 

10. Other rational equations f(x) = o . . . . . .8 

11, 12. The irreducible equation <£(x) = o 10 

13, 14. The degree of the irreducible equation a power of 2 . .11 

Chapter II. The Delian Problem and the Trisection of the 

Angle. 

1. The impossibility of solving the Delian problem with straight 

edge and compasses ....... 13 

2. The general equation x 3 = X 13 

3. The impossibility of trisecting an angle with straight edge 

and compasses ........ 14 

Chapter III. The Division of the Circle into Equal Parts. 

1. History of the problem ........ 10 

2- 4. Gauss’s prime numbers ....... 17 

5. The cyclotomic equation . . . . . . .19 

6. Gauss’s Lemma 19 

7, 8. The irreducibility of the cyclotomic equation . . .21 


XIV 


C ON TEN TS. 


Chapter IV. The Construction of the Regular Polygon of 
17 Sides. 

PAGE 

1. Algebraic statement of the problem 24 

2-4. The periods formed from the roots 25 

5, 6 . The quadratic equations satisfied by the periods . . 27 

7. Historical account of constructions with straight edge and 

compasses 

8 , 9 . Von Staudt’s construction of the regular polygon of 17 sides 34 

Chapter V. General Considerations on Algebraic Constructions. 

1. Paper folding 42 

2. The conic sections 42 

3. The Cissoid of Diodes 44 

4. The Conchoid of Nicomedes ...... 45 

6 . Mechanical devices . . . . . . .47 

PART II. 

Transcendental Numbers and the Quadrature of the Circle. 

Chapter I. Cantor’s Demonstration of the Existence of 
Transcendental Numbers. 

1. Definition of algebraic and of transcendental numbers . 49 

2 . Arrangement of algebraic numbers according to height . 50 

3. Demonstration of the existence of transcendental numbers 63 

Chapter IT. Historical Survey of the Attempts at the Com- 
putation and Construction of tt. 

1. The empirical stage ........ 56 

2. The Greek mathematicians ...... 56 

3. Modern analysis from 1670 to 1770 58 

4, 5. Revival of critical rigor since 1770 ..... 59 

Chapter 111 . The Transcendence of the Number e. 

1. Outline of the demonstration ...... 61 

2. The symbol hr and the function <p(x) .... 62 

3. Hermite’s Theorem ....... 65 


CONTENTS. 


xv 


Chapter IV. The Transcendence of the Number w. 

PAOE 

1. Outline of the demonstration ...... 68 

2. The function ^(x) ........ 70 

3. Lindeinann’s Theorem ....... 73 

4. Lindemann’s Corollary ....... 74 

5. The transcendence of tt 76 

6. The transcendence of y = ex 77 

7. The transcendence of y = sin — ! x ..... 77 

Chapter V. The Integraph and the Geometric Construction 

of TT. 

1. The impossibility of the quadrature of the circle with straight 

edge and compasses 78 

2. Principle of the integraph 78 

3. Geometric construction of w . . . .79 


Notes 


81 



INTRODUCTION. 


This course of lectures is due to the desire on my part to 
bring the study of mathematics in the university into closer 
touch with the needs of the secondary schools. Still it is not 
intended for beginners, since the matters under discussion are 
treated from a higher standpoint than that of the schools. 
On the other hand, it presupposes but little preliminary work, 
only the elements of analysis being required, as, for example, 
in the development of the exponential function into a series. 

We propose to treat of geometrical constructions, and our 
object will not be so much to find the solution suited to each 
case as to determine the possibility or impossibility of a 
solution. 

Three problems, the object of much research in ancient 
times, will prove to be of special interest. They are 

1. The problem of the duplication of the cube (also called 
the Delian problem ). 

2. The trisection of an arbitrary angle. 

3. The quadrature of the circle, i.e., the construction of v. 

In all these problems the ancients sought in vain for a 
solution with straight edge and compasses, and the celebrity 
of these problems is due chiefly to the fact that their solution 
seemed to demand the use of appliances of a higher order. 
In fact, we propose to show that a solution by the use of 
straight edge and compasses is impossible. 


2 


INTRODUCTION. 


The impossibility of the solution of the third problem was 
demonstrated only very recently. That of the first and second 
is implicitly involved in the Galois theory as presented to-day 
in treatises on higher algebra. On the other hand, we find 
no explicit demonstration in elementary form unless it be in 
Petersen’s text-books, works which are also noteworthy in 
other respects. 

At the outset we must insist upon the difference between 
practical and theoretical constructions. For example, if we 
need a divided circle as a measuring instrument, we construct 
it simply on trial. Theoretically, in earlier times, it was 
possible (t.e., by the use of straight edge and compasses) only 
to divide the circle into a number of parts represented by 
2", 3, and 5, and their products. Gauss added other cases 
by showing the possibility of the division into parts where 
p is a prime number of the form p — 2^ + 1, and the impos- 
sibility for all other numbers. No practical advantage is 
derived from these results; the significance of Gauss’s de- 
velopments is purely theoretical. The same is true of all the 
discussions of the present course. 

Our fundamental problem may be stated: What geometrical 
constructions are, and what are not, theoretically possible? To 
define sharply the meaning of the word “construction,” we 
must designate the instruments which we propose to use in 
each case. We shall consider 

1. Straight edge and compasses, 

2. Compasses alone, 

3. Straight edge alone, 

4. Other instruments used in connection with straight edge 
and compasses. 

The singular thing is that elementary geometry furnishes 
no answer to the question. We must fall back upon algebra 
and the higher analysis. The question then arises : How 


INTRODUCTION. 


3 


shall we use the language of these sciences to express the 
employment of straight edge and compasses? This new 
method of attack is rendered necessary because elementary 
geometry possesses no general method, no algorithm , as do 
the last two sciences. 

In analysis we have first rational operations : addition, 
subtraction, multiplication, and division. These operations 
can be directly effected geometrically upon two given seg- 
ments by the aid of proportions, if, in the case of multiplica- 
tion and division, we introduce an auxiliary unit-segment. 

Further, there are irrational operations, subdivided into 
algebraic and transcendental. The simplest algebraic opera- 
tions are the extraction of square and higher roots, and the 
solution of algebraic equations not solvable by radicals, such 
as those of the fifth and higher degrees. As we know how to 
construct Vab, rational operations in general, and irrational 
operations involving only square roots, can be constructed. 
On the other hand, every individual geometrical construction 
which can be reduced to the intersection of two straight 
lines, a straight line and a circle, or two circles, is equivalent 
to a rational operation or the extraction of a square root. In 
the higher irrational operations the construction is therefore 
impossible, unless we can find a way of effecting it by the aid 
of square roots. In all these constructions it is obvious that 
the number of operations must be limited. 

We may therefore state the following fundamental theorem : 
The necessary and sufficient condition that an analytic expres- 
sion can be constructed with straight edge and compasses is that 
it can be derived from the known quantities by a finite number 
of rational operations and square roots. 

Accordingly, if we wish to show that a quantity cannot be 
constructed with straight edge and compasses, we must prove 
that the corresponding equation is not solvable by a finite 
number of square roots. 


4 


INTRODUCTION. 


A fortiori the solution is impossible when the problem 
has no corresponding algebraic equation. An expression 
which satisfies no algebraic equation is called a transcenden- 
tal number. This case occurs, as we shall show, with the 
number tt. 


PART I. 


THE POSSIBILITY OF THE CONSTBUCTION OF ALGEBEAIC 
EXPBESSIONS. 


CHAPTER I. 

Algebraic Equations Solvable by Square Roots. 

The following propositions taken from the theory of alge- 
braic equations are probably known to the reader, yet to 
secure greater clearness of view we shall give brief demon- 
strations. 

If x, the quantity to be constructed, depends only upon rational 
expressions and square roots, it is a root of an irreducible equa- 
tion <f>(x) = 0, whose degree is always a power of 2. 

1 . To get a clear idea of the structure of the quantity x, 
suppose it, e.g., of the form 

_ \/a + Vc + ef + \Jd + Vb p + Vq 
Va + Vb Vr 

where a, b, c, d, e, f, p, q, r are rational expressions. 

2. The number of radicals one over another occurring in 
any term of x is called the order of the term ; the preceding 
expression contains terms of orders 0, 1, 2. 

8. Let p designate the maximum order, so that no term 
can have more than p radicals one over another. 


6 


FAMOUS PROBLEMS. 


4 . In the example x = V2+V3 + V6, we have three 
expressions of the first order, but as it may be written 

x = V 2 + V3+ V2- Vs, 

it really depends on only two distinct expressions. 

We shall suppose that this reduction has been made in all the 
terms of x, so that among the n terms of order y none can be 
expressed rationally as a function of any other terms of order y 
or of lower order. 

We shall make the same supposition regarding terms of 
the order y — 1 or of lower order, whether these occur ex- 
plicitly or implicitly. This hypothesis is obviously a very 
natural one and of great importance in later discussions. 

5. Normal Form of x. 

If the expression x is a sum of terms with different denom- 
inators we may reduce them to the same denominator and 
thus obtain x jis the quotient of two integral functions. 

Suppose V Q one of the terms of x of order y ; it can occur 
in x only explicitly, since_/i is the maximum order. Since, 
further, the powers of VQ may be expressed as functions of 
VQ and Q, which is a term of lower order, we may put 

a + bVQ 
c + d VQ 

where a, b, c, d contain no more than n — 1 terms of order y, 
besides terms of lower order. 

Multiplying both terms of the fraction by c — d n/Q, VQ 
disappears from the denominator, and we may write 

(ac — bdQ) + (be — ad) VQ V q 

c 5 — d J Q a-t/jvy, 

where a and 0 contain no more than n — 1 terms of order y. 

For a second term of order y we have, in a similar manner, 
x = a, + /(?, V Qi, etc. 


ALGEBRAIC EQUATIONS. 


7 


The x may, therefore, be transformed so as to contain a term 
of given order y only in its numerator and there only linearly. 

We observe, however, that products of terms of order y 
may occur, for a and /3 still depend upon n — 1 terms of order 
y. We may, then, put 

“ = a ii + a is VQ 1; /? = - f-/3 la VQi, 

and hence 

x = (“a + «n V Qi) + (Pu + A 2 V Qi) V Q. 

6. We proceed in a similar way with the different terms 
of order y — 1, which occur explicitly and in Q, Q„ etc., so 
that each of these quantities becomes an integral linear func- 
tion of the term of order y — 1 under consideration. We 
then pass on to terms of lower order and finally obtain x, or 
rather its terms of different orders, under the form of rational 
integral linear functions of the individual radical expressions 
which occur explicitly. We then say that x is reduced to 
the normal form. 

7. Let m be the total number of independent (4) square 
roots occurring in this normal form. Giving the double sign 
to these square roots and combining them in all possible ways, 
we obtain a system of 2 m values 

Xi, Xj, .... X 2m , 

which we shall call conjugate values. 

We must now investigate the equation admitting these 
conjugate values as roots. 

8. These values are not necessarily all distinct ; thus, if 

we have x = + Vb + \] e, — Vb, 

this expression is not changed when we change the sign of 

Vb. 


8 


FAMOUS PROBLEMS. 


9. If x is an arbitrary quantity and we form the poly- 
nomial 

F (x) = (x — X,) (x — x 2 ) . . . (x — x jm ), 

F (x) = 0 is clearly an equation having as roots these con- 
jugate values. It is of degree 2 m , but may have equal 
roots (8). 

The coefficients of the polynomial F (x) arranged with respect 
to x are rational. 

For let us change the sign of one of the square roots ; this 
will permute two roots, say x A and x^, since the roots of 
F(x)=0 are precisely all the conjugate values. As these 
roots enter F (x) only under the form of the product 

(x — x A ) (x — x A ), 

we merely change the order of the factors of F (x). Hence 
the polynomial is not changed. 

F (x) remains, then, invariable when we change the sign of 
any one of the square roots ; it therefore contains only their 
squares ; and hence F (x) has only rational coefficients. 

10. When any one of the conjugate values satisfies a given 
equation with rational coefficients , f (x) = 0, the same is true of 
all the others. 

f (x) is not necessarily equal to F (x), and may admit other 
roots besides the x,’s. 

Let x, — u - }- fi VQ be one of the conjugate values ; VQ, a 
term of order p ; a and /? now depend only upon other terms 
of order p and terms of lower order. There must, then, be a 
conjugate value 

X,' = a-j8V Q. 

Let us now form the equation f (x,) = 0. f (x,) may be put 
into the normal form with respect to VQ, 

f(x.) = A+ B VQ: 


ALGEBRAIC EQUATIONS. 


9 


this expression can equal zero only when A and B are simul- 
taneously zero. Otherwise we should have 



i.e., VQ could be expressed rationally as a function of terras 
of order y and of terms of lower order contained in A and B, 
which is contrary to the hypothesis of the independence of 
all the square roots ( 4 ). 

But we evidently have 

f (x,') = A — B VQ ; 

hence if f (x x ) = 0, so also f (x, 1 ) = 0. Whence the following 
proposition : 

If x, satisfies the equation f (x) = 0, the same is true of all 
the conjugate values derived from x, by changing the signs of 
the roots of order y. 

The proof for the other conjugate values is obtained in an 
analogous manner. Suppose, for example, as may be done 
without affecting the generality of the reasoning, that the 
expression x, depends on only two terms of order y, VQ and 
VQ'. f (x,) may be reduced to the following normal form : 

(а) f (xj) = p + q VQ + r VQ' + s VQ ■ VQ’ = 0. 

If x, depended on more than two terms of order y, we should 
only have to add to the preceding expression a greater num- 
ber of terms of analogous structure. 

Equation (a) is possible only when we have separately 

(б) p = 0, q = 0, r = 0, s = 0. 

Otherwise VQ and VQ' would be connected by a rational 
relation, contrary to our hypothesis. 

Let now VR, VR', . . . be the terms of order /i-l on 
which X[ depends ; they occur in p, q, r, s ; then can the 
quantities p, q, r, s, in which they occur, be reduced to the 


10 


FAMOUS PROBLEMS. 


normal form with respect to VR and VR' ; and if, for the 
sake of simplicity, we take only two quantities, VR and VR 7 , 
we have 

(c) P — *1 + A, VR + /x, VR' + v, VR . VR' = 0, 
and three analogous equations for q, r, s. 

The hypothesis, already used several times, of the inde- 
pendence of the roots, furnishes the equations 

( d ) * = 0, A = 0, u = 0, v = 0. 

Hence equations (c) and consequently f (x) = 0 are satisfied 
when for Xl we substitute the conjugate values deduced by 
changing the signs of VR and VR'. 

Therefore the equation f (x) = 0 is also satisfied by all the 
conjugate values deduced from Xl by changing the signs of the 
roots of order /x — 1. 

The same reasoning is applicable to the terms of order 
P ~ 2, p — 3, . . . and our theorem is completely proved. 

11. We have so far considered two equations 

F (x) = 0 and f (x) = 0. 

Both have rational coefficients and contain the x,’s as roots. 
F (x) is of degree 2 m and may have multiple roots j f (x) may 
have other roots besides the x,’s. We now introduce a third 
equation, <f> (x) = 0, defined as the equation of lowest degree, 
with rational coefficients, admitting the root x, and conse- 
quently all the x,’s (10). 

12. Properties of the Equation <£ (x) = 0. 

I. (x) = 0 is an irreducible equation, i.e., 4> (x) cannot be 
resolved into two rational polynomial factors. This irreduci- 
bility is due to the hypothesis that (x) = 0 is the rational 
equation of lowest degree satisfied by the x,’s. 

For if we had 


^ ( x ) = (x) X ( x )» 


ALGEBRAIC EQUATIONS. 


11 


then <f> (xj) = 0 would require either (x x ) = 0, or * (xi) = 0, 
or both. But since these equations are satisfied by all the 
conjugate values (10), <£ (x) =0 would not then be the equa- 
tion of lowest degree satisfied by the x^s. 

II. <f> (x) = 0 has no multiple roots. Otherwise <f> (x) could 
be decomposed into rational factors by the well-known meth- 
ods of Algebra, and <£ (x) = 0 would not be irreducible. 

III. <f>(x)= 0 has no other roots than the x/s. Otherwise 
F (x) and <f> (x) would admit a highest common divisor, which 
could be determined rationally. We could then decompose 
<f> (x) into rational factors, and <f> (x) would not be irreducible. 

IV. Let M be the number of x,’s which have distinct values, 
and let 

X l> X 2 j • • • X M 

be these quantities. We shall then have 

4> (x) = C (x — Xi) (x — Xj) . . . (x — x M ). 

For <f> (x) = 0 is satisfied by the quantities x, and it has no 
multiple roots. The polynomial <f> (x) is then determined save 
for a constant factor whose value has no efEect upon <f> (x) = 0 

V. <j> (x) = 0 is the only irreducible equation with rational 
coefficients satisfied by the x ( ’s. For if f (x) = 0 were another 
rational irreducible equation satisfied by x, and consequently 
by the x,’s, f (x) would be divisible by <f> (x) and therefore 
would not be irreducible. 

By reason of the five properties of <f> (x) = 0 thus estab- 
lished, we may designate this equation, in short, as the irre- 
ducible equation satisfied by the x,’s. 

13. Let us now compare F (x) and <f> (x). These two poly- 
nomials have the x,’s as their only roots, and <j> (x) has no 
multiple roots. F (x) is, then, divisible by <f> (x) ; that is, 

F ( x )= F i( x ) <K X )- 


12 


FAMOUS PROBLEMS. 


F! (x) necessarily has rational coefficients, since it is the quo- 
tient obtained by dividing F (x) by <f> (x). If F, (x) is not a 
constant it admits roots belonging to F (x) ; and admitting 
one it admits all the x,’s (10). Hence F, (x) is also divisible 
by <f> (x), and 

Fi (x)= F 2 (x) <£(x). 


If Fj (x) is not a constant the same reasoning still holds, the 
degree of the quotient being lowered by each operation. 
Hence at the end of a limited number of divisions we reach 
an equation of the form 


and for F (x), 


F„_ i (x) = Ci • <£ (x), 
F(x) = C 1 -[>(x)] 1 '. 


The polynomial F (x) is then a power of the polynomial of 
minimum degree <f> (x), except for a constant factor. 

14. We can now determine the degree M of <£(x). F (x) 
is of degree 2 m ; further, it is the yth power of (x). Hence 

2 m = v • M. 

Therefore M is also a power of 2 and we obtain the following 
theorem : 

The degree of the irreducible equation satisfied by an expres- 
sion composed of square roots only is always a power of 2. 

15. Since, on the other hand, there is only one irreducible 
equation satisfied by all the x,’s (12, V.), we have the converse 
theorem : 

If an irreducible equation is not of degree 2*, it cannot be 
solved by square roots. 


CHAPTER II. 


The Delian Problem and the Trisection of the Angle. 

1. Let us now apply the general theorem of the preceding 
chapter to the Delian problem, i.e., to the problem of the 
duplication of the cube. The equation of the problem is 
manifestly 

x 3 = 2. 

This is irreducible, since otherwise V2 would have a 
rational value. For an equation of the third degree which is 
reducible must have a rational linear factor. Further, the 
degree of the equation is not of the form 2 h ; hence it cannot 
be solved by means of square roots, and the geometric con- 
struction with straight edge and compasses is impossible. 

2. Next let us consider the more general equation 

x 3 = A, 

A designating a parameter which may be a complex quantity 
of the form a -f- ib. This equation furnishes us the analyt- 
ical expressions for the geometrical problems of the multi- 
plication of the cube and the trisection of an arbitrary angle. 
The question arises whether this equation is reducible, i.e., 
whether one of its roots can be expressed as a rational func- 
tion of A. It should be remarked that the irreducibility of 
an expression always depends upon the values of the quan- 
tities supposed to be known. In the case x 3 = 2, we were 
dealing with numerical quantities, and the question was 
whether V2 could have a rational numerical value. In the 
equation x 3 = A we ask whether a root can be represented by 
a rational function of A. In the first case, the so-called 


14 


FAMOUS PROBLEMS. 


domain of rationality comprehends the totality of rational 
numbers ; in the second, it is made up of the rational func- 
tions of a parameter. If no limitation is placed upon this 

parameter we see at once that no expression of the form ^ ^ , 

\p(\) 

in which <f> (X) and if/ (A) are polynomials, can satisfy our 
equation. Under our hypothesis the equation is therefore 
irreducible, and since its degree is not of the form 2 h , it can- 
not be solved by square roots. 



S. Let us now restrict the variability of X. Assume 

A = r (cos <f> + i sin <j > ) ; 
whence Vx=Vr Vcos <£ + i sin <f>. 

Our problem resolves itself into two, to 
extract the cube root of a real number and 
also that of a complex number of the form 
cos <f> -f- i sin <j>, both numbers being regarded 
as arbitrary. We shall treat these separately. 

I. The roots of the equation x 3 = r are 

\~t, < ft, y/r, 

representing by e and t* the complex cube roots of unity 

— 1 + i V3 , — 1 - j V3 


Fig. 1. 


Taking for the domain of rationality the totality of rational 
functions of r, we know by the previous reasoning that the 
equation x 3 =r is irreducible. Hence the problem of the 
multiplication of the cube does not admit, in general, of a 
construction by means of straight edge and compasses. 

II. The roots of the equation 

x 3 = cos 4> 4- i sin <(> 
are, by De Moivre’s formula. 


THE TRISECTION OF THE ANGLE. 


15 



These roots are represented geometrically by the vertices of 
an equilateral triangle inscribed in the circle with radius 
unity and center at the origin. The 



figure shows that to the root cor- zic+<p 
responds the argument Hence / 

O I 

the equation f 


x 3 = cos <t> -f- i sin $ \ \ / / J 

is the analytic expression of the \ \ / / 

problem of the trisection of the ^ 

angle. 3 

If this equation were reducible, 
one, at least, of its roots could be represented as a rational 
function of cos $ and sin <£, its value remaining unchanged 
on substituting <f> -f- 2n for <f>. But if we effect this change 
by a continuous variation of the angle <£, we see that the 
roots x 1( x 2 , x 8 undergo a cyclic permutation. Hence no root 
can be represented as a rational function of cos <f> and sin <f>. 
The equation under consideration is irreducible and therefore 
cannot be solved by the aid of a finite number of square roots. 
Hence the trisection of the angle cannot be effected with straight 
edge and compasses. 

This demonstration and the general theorem evidently hold 
good only when <j> is an arbitrary angle ; but for certain spe- 
cial values of <p the construction may prove to be possible, 

e.g., when = |. 


CHAPTER HI. 


The Division of the Circle into Equal Parts. 

1. The problem of dividing a given circle into n equal 
parts has come down from antiquity ; for a long time we 
have known the possibility of solving it when n = 2 b , 3, 5, or 
the product of any two or three of these numbers. In his 
Disquisitiones Arithmeticae, Gauss extended this series of 
numbers by showing that the division is possible for every 
prime number of the form p = 2^ + 1 but impossible for all 
other prime numbers and their powers. If in p = 2** + 1 
we make p = 0 and 1, we get p = 3 and 5, cases already 
known to the ancients. For p = 2 we get p = 2^ -f 1 = 17, 
a case completely discussed by Gauss. 

For fj. = 3 we get p = 2 s * 1 = 257, likewise a prime num- 

ber. The regular polygon of 257 sides can be constructed. 
Similarly for p = 4, since 2^ -)- 1 = 65537 is a prime number. 
t*= 5 > (“= 6 >i“= |U= 8, p = 9, p = 11 ,p = 12 ,p = 15 ,(i= 18, 

f* = 23, fi = 36, n — 38, p = 73 give no prime numbers. The 
proof that the large numbers corresponding to fi = 5, 6, . . . , 73 
are not prime has required a large expenditure of labor and 
ingenuity. It is, therefore, quite possible that /t = 4 is the 
last number for which a solution can be effected. 

Upon the regular polygon of 257 sides Richelot published 
an extended investigation in Crelle’s Journal, IX, 1832, 
pp. 1-26, 146-161, 209-230, 337-356. The title of the 
memoir is . De resolutione algebraica aequationis x m — 1 , sive 
de divisione circuli per bisectionem anguli septies repetitam in 
partes 257 inter se aequales commentatio coronata. 


THE DIVISION OF THE CIRCLE. 


17 


To the regular polygon of 65537 sides Professor Hermes 
of Lingen devoted ten years of his life, examining with care 
all the roots furnished by Gauss’s method. His MSS. are 
preserved in the collection of the mathematical seminary in 
Gottingen. (Compare a communication of Professor Hermes 
in No. 3 of the Giittinger Nachrichten for 1894.) 

2. We may restrict the problem of the division of the 

circle into n equal parts to the cases where n is a prime num- 
ber p or a power p a of such a number. Por if n is a com- 
posite number and if y. and v are factors of n, prime to each 
other, we can always find integers a and b, positive or nega- 
tive, such that * i . 

X = a/x -f- Dr ; 

whence — = - + 

yv v y. 

To divide the circle into yv = n equal parts it is sufficient to 
know how to divide it into y and v equal parts respectively. 
Thus, for n = 15, we have 

1 _ 2_3 
15 3 5' 

3. As will appear, the division into p equal parts (p being 
a prime number) is possible only when p is of the form 
p = 2 h -fl. We shall next show that a prime number can 
be of this form only when h = 2 M . For this we shall make 
use of Fermat’s Theorem : 

If p is a prime number and a an integer not divisible by p, 
these numbers satisfy the congruence 

a p ~ 1 = + 1 (mod. p). 

p — 1 is not necessarily the lowest exponent which, for a 
given value of a, satisfies the congruence. If s is the lowest 
exponent it may be shown that s is a divisor of p — 1. In 
particular, if s = p — 1 we say that a is a primitive root of p, 


18 


FAMOUS PROBLEMS. 


and notice that for every prime number p there is a primitive 
root. We shall make use of this notion further on. 

Suppose, then, p a prime number such that 

(1) p = 2* + l, 
and s the least integer satisfying 

(2) 2 B = + 1 (mod. p). 

From (1) 2“ < p ; from (2) 2 s > p. 

.-. s > h. 

(1) shows that h is the least integer satisfying the congruence 

(3) 2” = — 1 (mod. p). 

From (2) and (3), by division, 

2s-h = _ i ( m0{ i. p). 

(4) s - h <£ h, s <C 2h. 

From (3), by squaring, 

2 th = 1 (mod. p). 

Comparing with (2) and observing that s is the least expo- 
nent satisfying congruences of the form 
2 X = 1 (mod. p), 

we have 

(5) s ]> 2h. 

.-. s = 2h. 

We have observed that s is a divisor of p — 1 = 2 h ; the same 
is true of h, which is, therefore, a power of 2. Hence prime 
numbers of the form 2 h + 1 are necessarily of the form 
2 %a + 1 . 

4 . This conclusion may be established otherwise. Sup- 
pose that h is divisible by an odd number, so that 
h = h’(2n + l); 
then, by reason of the formula 

x 2 -* 1 + 1 = ( x + 1) (x 2 " - x 2 "- 1 + • ■ . — x + 1), 


THE CYCLOTOMIC EQUATION. 


19 


P = 2 h<2 ” + 1) + 1 is divisible by 2 h + 1, and hence is not a 
prime number. 

6. We now reach our fundamental proposition : 
p being a prime number, the division of the circle into p equal 
parts by the straight edge and compasses is impossible unless p 
is of the form 

p = 2 h + 1 = 2^ + 1. 

Let us trace in the z-plane (z = x -f- iy) a circle of radius 1. 
To divide this circle into n equal parts, beginning at z = 1, is 
the same as to solve the equation 

z" — 1 = 0. 

This equation admits the root z = 1 ; let us suppress this root 
by dividing by z — 1, which is the same geometrically as to 
disregard the initial point of the division. We thus obtain 
the equation 

2 n_] + 2 „_ s+ . . . +z + l = 0 , 

which may be called the cyclotomic equation. As noticed 
above, we may confine our attention to the cases where n is 
a prime number or a power of a prime number. We shall 
first investigate the case when n — p. The essential point of 
the proof is to show that the above equation is irreducible. 
For since, as we have seen, irreducible equations can only be 
solved by means of square roots in finite number when their 
degree is a power of 2, a division into p parts is always im- 
possible when p — 1 is not equal to a power of 2, i.e.. when 

p # 2 h + 1 ^ 2^ + 1. 

Thus we see why Gauss’s prime numbers occupy such an 
exceptional position. 

6. At this point we introduce a lemma known as Gauss’s 
Lemma. If 

F(z) = z m + Az m -'+ Bz m -*+ . . . + Lz + M, 


20 


FAMOUS PROBLEMS. 


where A, B, . . . are integers, and F(z) can be resolved into 
two rational factors f (z) and <f> (z), so that 

F (z) = f (z) • <£ (z) = (z m + a,z m + a s z m ~ 2 + . . .) 

(z m '' + &z m "- 1 + /3 s z m "- 2 + . . .), 

then must the a’s and /3’s also be integers. In other 
words : 

If an integral expression can be resolved, into rational factors 
these factors must be integral expressions. 

Let us suppose the a ! s and /?’ s to be fractional. In each 
factor reduce all the coefficients to the least common denom- 
inator. Let a„ and b 0 be these common denominators. 
Finally multiply both members of our equation by a 0 b 0 . It 
takes the form 

a„b 0 F(z) — fi (z) <*>! (z) = (a„z m ' 4^ + . . .) 

(b„z m "4-b 1 z m "- 1 4-. • •)• 

The a’s are integral and prime to one another, as also the b’s, 
since a 0 and b„ are the least common denominators. 

Suppose a 0 and b 0 different from unity and let q be a prime 
divisor of a 0 b 0 . Further, let a, be the first coefficient of f, (z) 
and b k the first coefficient of <f > , (z) not divisible by q. Let 
us develop the product fj (z) <£., (z) and consider the coefficient 
of + It will be 

aib k 4" a,_ l b k + l 4- a,_ ? b, 1 + c 4- ■ ■ • 4- a 1+ ib k _ x 4- a 1 + : b k _ I 4- • • • 

According to our hypotheses, all the terms after the first are 
divisible by q, but the first is not. Hence this coefficient is not 
divisible by q. Now the coefficient of z m +™"-i- k in the first 
member is divisible by a 0 b 0 , i.e., by q. Hence if the identity 
is true it is impossible for a coefficient not divisible by q to 
occur in each polynomial. The coefficients of one at least of 
the polynomials are then all divisible by q. Here is another 
absurdity, since we have seen that all the coefficients are 


THE CYCLOTOMIC EQUATION. 


21 


prime to one another. Hence we cannot suppose a 0 and b 0 
different from 1, and consequently the a’s and /J’s are in- 
tegral. 


7. In order to show that the cyclotomic equation is irre- 
ducible, it is sufficient to show by Gauss’s Lemma that the 
first member cannot be resolved into factors with integral 
coefficients. To this end we shall employ the simple method 
due to Eisenstein, in Crelle’s Journal , XXXIX, p. 167, which 
depends upon the substitution 


We obtain 


z = x -f 1. 


f (z) = ~~ = + y 1 = + p*p- 2 + ~ 

+ • • • + x + p = 0. 


All the coefficients of the expanded member except the first 
are divisible by p ; the last coefficient is always p itself, by 
hypothesis a prime number. An expression of this class is 
always irreducible. 

For if this were not the case we should have 

f(x + l) = (x'" + a 1 x n '- 1 + . . . +a m _ l x + a m ) 

(x" , + b 1 x“ , -‘+. . . +b m ._ 1 x + b m .), 

where the a’s and b’s are integers. 

Since the term of zero degree in the above expression of 
f (z) is p, we have a m b m ' — p. p being prime, one of the fac- 
tors of a m b m ' must be unity. Suppose, then, 

a ro = ±p, b m *= ± 1. 

Equating the coefficients of the terms in x, we have 
P(P-1) . , , 

o 1 I 1* 


22 


FAMOUS PROBLEMS. 


The first member and the second term of the second being 
divisible by p, a m _jb m must be so also. Since b m = ± 1, 
a ra _! is divisible by p. Equating the coefficients of the terms 
in x 2 we may show that a m _ a is divisible by p. Similarly 
we show that all of the remaining coefficients of the factor 
x ra + a l x“~ I • • • + a m _, x + a m are divisible by p. But 
this cannot be true of the coefficient of x ra , which is 1. 
The assumed equality is impossible and hence the cyclo- 
tomic equation is irreducible when p is a prime. 

8. We now consider the case where n is a power of a 
prime number, say n = p a . We propose to show that when 
p > 2 the division of the circle into p 2 equal parts is impos- 
sible. The general problem will then be solved, since the 
division into p a equal parts evidently includes the division 
into p 2 equal parts. 

The cyclotomic equation is now 



It admits as roots extraneous to the problem those which 
come from the division into p equal parts, i.e., the roots of 
the equation p _ < 


Suppressing these roots by division we obtain 

z pi! — 1 

f « = ^-r = ° 

as the cyclotomic equation. This may be written 

Z P(P-1) _|_ z P(p-2) -f . . . -f z p + 1 = 0. 

Transforming by the substitution 

z = x + 1, 

we have 

( x + l)p(p-i)-f (x + 1)p<p-»)-(-. , . +( x + 1) P +1=0. 


THE CYCLOTOMIC EQUATION. 


23 


The number of terms being p, the term independent of x after 
development will be equal to p, and the sum will take the 
form 

x^-Hp-xW, 

where x (x) is a polynomial with integral coefficients whose 
constant term is 1. We have just shown that such an expres- 
sion is always irreducible. Consequently the new cyclotomic 
equation is also irreducible. 

The degree of this equation is p (p — 1). On the other 
hand an irreducible equation is solvable by square roots only 
when its degree is a power of 2. Hence a circle is divisible 
into p 2 equal parts only when p = 2, p being assumed to be a 
prime. 

The same is true, as already noted, for the division into p“ 
equal parts when a > 2. 


CHAPTER IY. 


The Construction of the Regular Polygon of 17 Sides. 

1. We have just seen that the division of the circle into 
equal parts by the straight edge and compasses is possible 
only for the prime numbers studied by Gauss. It will now 
be of interest to learn how the construction can actually be 
effected. 

The purpose of this chapter, then, will be to show in an 
elementary way how to inscribe in the circle the regular poly- 
gon of 17 sides. 

Since we possess as yet no method of construction based 
upon considerations purely geometrical, we must follow the 
path indicated by our general discussions. We consider, first 
of all, the roots of the cyclotomic equation 

x ,6 + x ,s + . . . + x 2 + x +1=0, 

and construct geometrically the expression, formed of square 
roots, deduced from it. 

We know that the roots can be put into the transcendental 


form 




and if 

2*7T . . 2*7 r 

t« = cos — + i sin — 

(* = 1,2, . 

. . 16) ; 


2tt 

<1 = cos — + 

. . 27T 

' SID 17’ 


that 





Geometrically, these roots are represented in the complex 
plane by the vertices, different from 1, of the regular polygon 
of 17 sides inscribed in a circle of radius 1, having the origin 


THE REGULAR POLYGON OF 17 SIDES. 


25 


as center. The selection of c, is arbitrary, but for the con- 
struction it is essential to indicate some « as the point of 
departure. Having fixed upon e 1; the angle corresponding to 
*« is k times the angle corresponding to c 1} which completely 
determines e„. 


2. The fundamental idea of the solution is the following : 
Forming a primitive root to the modulus 17 we may arrange 
the 16 roots of the equation in a cycle in a determinate order. 

As already stated, a number a is said to be a primitive root 
to the modulus 17 when the congruence 


a s = + 1 (mod. 17) 

has for least solution s = 17 — 1 = 16. The number 3 pos- 
sesses this property ; for we have 


CO 

III 

cb 

III 

«-■» 

CO 

3 2 = 9 

3 6 = 15 

3 3 = 10 

vH 

iH 

in 

cb 

3 4 =13 

3* = 16 


3 s = 14 

3 10 = 8 

3 11 = 7 
3 U = 4 


3 13 = 12 ' 

3 14 = 2 

3 15 = 6 ' 

3 16 = 1 


(mod. 17). 


Let us then arrange the roots c, so that their subscripts 
are the preceding remainders in order 


*8) *9. *10) *13) *5) *15) *11) *16) *14) *8) *7) *4) *1!) *S) *6) *1- 

Notice that if r is the remainder of 3" (mod. 17), we have 


whence 


3‘ = 17q + r, 

* r = *i r = fi 5 ‘ 


If r' is the next remainder, we have similarly 

V = «/ + ‘= («/)*= (O*- 

Hence in this series of roots each root is the cube of the preceding. 

Gauss’s method consists in decomposing this cycle into 
sums containing 8, 4, 2, 1 roots respectively, corresponding 
to the divisors of 16. Each of these sums is called a period. 


26 


FAMOUS PROBLEMS. 


The periods thus obtained may be calculated successively as 
roots of certain quadratic equations. 

The process just outlined is only a particular case of that 
employed in the general case of the division into p equal 
parts. The p — 1 roots of the cyclotomic equation are cyclic- 
ally arranged by means of a primitive root of p, and the 
periods may be calculated as roots of certain auxiliary equa- 
tions. The degree of these last depends upon the prime fac- 
tors of p — 1. They are not necessarily equations of the 
second degree. 

The general case has, of course, been treated in detail by 
Gauss in his Disquisitiones, and also by Bachmann in his 
work, Die Lehre von der Kreisteilung (Leipzig, 1872). 

S. In our case of the 16 roots the periods may be formed 
in the following manner : Form two periods of 8 roots by 
taking in the cycle, first, the roots of even order, then those 
of odd order. Designate these periods by x, and x 2 , and 
replace each root by its index. We may then write symbol- 
ically 

X! = 9 + 13 + 15 + 16+ 8 + 4+ 2+1, 
x 2 = 3 + 10+ 5 + 11 + 14 + 7 + 12 + 6. 

Operating upon X[ and x 2 in the same way, we form 4 periods 
of 4 terms : 

y, = 13+ 16 + 4+ 1, 
ya = 9 + 15+ 8+ 2, 
y 3 = 10 + ll+ 7+ 6, 
y 4 = 3+ 5+14 + 12. 

same way upon the y’s, we obtain 8 periods 

16 + 1, z 6 = 11 + 6, 

13 + 4, z 6 = 10 + 7, 

15 + 2, z 7 = 5 + 12, 

9 + 8, z e = 3 + 14. 


Operating in the 
of 2 terms : 

Zl = 
z 2 = 
z 3 = 

Z 4 — 


THE REGULAR POLYGON OF 17 SIDES. 


27 


It now remains to show that these periods can be calculated 
successively by the aid of square roots. 


4. It is readily seen that the sum of the remainders corre- 
sponding to the roots forming a period z is always equal to 17. 
These roots are then « r and « l7 _ r ; 


2n . . . 2v 

«r = cos f + i sin r— , 

= <n-t = cos (17 — r) ^ + i sin (17 — r) ^ , 


Hence 


17 

2n . . 2 7r 

= cos r — — 1 sin r — • 


c r + r r - = 2 cos r 


2 n 

17' 


Therefore all the periods z are real, and we readily obtain 


o ^ 

z, = 2 cos — , 

z 6 = 2 cos 6 

O A 

z 2 = 2 cos 4—, 

n - 2tT 

z c = 2 cos 7 — , 

z 3 = 2 cos 2 

z, = 2 cos 5—, 

z 4 = 2 cos 8^, 

O Q 

z 8 = 2 cos 3—. 

Moreover, by definition, 


X 1 = Z 1 + + z 3 -f z 4 , 

X 2 = Z 5 + z 6 + z 7 + z 8j 

yi = zi + z 2, y 2 = z 8 + z«, 

y 3 = z 5 + z 6j y 4 = Z 7 + Z 8 . 


6. It will be necessary to determine the relative magnitude 
of the different periods. For this purpose we shall employ 
the following artifice : We divide the semicircle of unit radius 
into 17 equal parts and denote by Si, S 7 , . . . S n the distances 


28 


FAMOUS PROBLEMS. 


of the consecutive points of division Aj, A a , . . . A 17 from the 
initial point of the semicircle, S n being equal to the diam- 
eter, i.e., equal to 2. The angle 
A«A I7 0 has the same measure as the 
half of the arc A«0, which equals 

Hence 
84 

c o • Klr n (17 — *)* 

S « = 2sin 34 = 2cOS 34 ' 

That this may be identical with 
2 cos h jj, we must have 

4h = 17 - k, 

« = 17 — 4h. 



the values 1 

, 2, 3, 4, 

5, 6, 7, 

8, we find 

5, 1,- 

-3, - 

7, - 11, 

-15. 

Hence 

z i = 

Sis, 


Z 6 = 

S 7 , 

z 2 = 

Si, 


z « = 

Sn, 

Z 3 = 

S 9 , 


Z,= 

S 3 , 

Z 4 = 

s 16 , 


Z 8 = 

s t . 


The figure shows that S« increases with the subscript ; hence 
the order of increasing magnitude of the periods z is 


Z 4> Z 8 > Z S! z 7) z l> z 8, z 3> z 1- 


Moreover, the chord A, A« + p subtends p divisions of the semi- 
mference and is equal 

that 

and a fortiori 


circumference and is equal to S p ; the triangle OA.A, + p shows 

S, + p < S, + S p , 

S« + p ^ S« + r S p + r '. 


Calculating the differences two and two of the periods y, we 
easily find 


THE REGULAR POLYGON OF 17 SIDES. 


29 


yi — yj = S 18 + Si — s 9 + Sib > o, 
yi — y 3 = Sis + Si + S, + Sn > 0, 
yi — y< = s 13 + Si + S 3 — S 5 > 0 , 
y* — y 3 = S 9 — s„ + S 7 + S„ > o, 
yi — y 4 = S 9 — Sis + S 8 — S 5 <0, 
y, — y* = — S, — S ,1 -r S 3 — Ss < 0. 

Hence 

y 3 < y a < y* < yi- 

Finally we obtain in a similar way 

X, < Xj. 

2tt 

6. We now propose to calculate z x = 2 cos — . After mak- 
ing this calculation and constructing z u we can easily deduce 
the side of the regular polygon of 17 sides. In order to find 
the quadratic equation satisfied by the periods, we proceed to 
determine symmetric functions of the periods. 

Associating z! with the period z 2 and thus forming the 
period we have, first, 

zi + z a = yi. 

Let us now determine Ziz 2 . We have 

ziz,= (16 + l) (13 + 4), 
where the symbolic product *p represents 

+ p* 

Hence it should be represented symbolically by * + p, remem- 
bering to subtract 17 from * + p as often as possible. Thus, 

ZiZ 2 = 12 + 3 + 14 + 5 = y 4 . 

Therefore z x and z 2 are the roots of the quadratic equation 

(0 z>-yiz + y 4 = 0, 

whence, since z, > z 2 , 

yi + Vy, 2 -4y 4 yi— Vyi J -4y 4 

Zl ~ 2 ) z i 2 


30 


FAMOUS PROBLEMS. 


We must now determine yi and y 4 . Associating y 4 with the 
period y 2 , thus forming the period x lf and y 3 with the period 
y 4 , thus forming the period x 2 , we have, first, 

yi + y a = x,. 

Then, 

y.y 2 = (13 + 16 + 4 + 1) (9 + 15 + 8 + 2). 

Expanding symbolically, the second member becomes equal 
to the sum of all the roots ; that is, to — 1. Therefore y 4 
and y 2 are the roots of the equation 

(y) y 2 - x,y - 1 = 0, 

whence, since y x > y 2 , 

„ _ *i + V X 1 2 + 4 _ xj — VxTTI 

Yl 2 y2 ~ 2 
Similarly, 

ys + y< = x 2 

and 

y 8 y 4 = - 1. 

Hence y, and y 4 are the roots of the equation 

(V) y 2 x 2 y 1=0; 

whence, since y 4 > y 3 , 

_x 2 +Vx 2 2 -|-4 x 2 — Vx 2 2 -)-4 

y< 2 ’ y® — 2 

It now remains to determine x, and x 2 . Since Xi -f x 2 is 
equal to the sum of all the roots, 

Xl + x 2 = — 1. 

Further, 

x 4 x 2 = (13 + 16 + 4 + 1 + 9 + 15 +8 + 2) 

(10 + 11 + 7 + 6 + 3 + 5 + 14 + 12). 
Expanding symbolically, each root occurs 4 times, and thus 
xix 2 — — 4. 


THE REGULAR POLYGON OF 17 SIDES. 


31 


Therefore Xi and x a are the roots of the quadratic 
(£) x 2 x — 4 = 0 ; 

whence, since x, > x a , 

„ __ -l+Vl7 — 1 — Vl7 

1 2 Xi ~ 2 

Solving equations £, tj, rf, £ in succession, z, is determined 
by a series of square roots. 

Effecting the calculations, we see that zj depends upon the 
four square roots 

Vl7, Vx7+4, Vx7‘+4, V^ 2 -Ty 4 . 

If we wish to reduce z 3 to the normal form we must see 
whether any one of these square roots can be expressed 
rationally in terms of the others. 

Now, from the roots of (rf), 

Vx, ! + 4 = yi - y„ 

Vx 2 2 -j-4 = y, -y.,. 

Expanding symbolically, we verify that 

(y> - y») (y« - y 3 ) = 2 (x, - x,) ,* 

* <yi - y2> (y« - y>) = (13 + 16 + 4 + 1 - 9 - 15 - 8 - 2 ) (.3 + 6 + 14 

+ 12- 10- 11-7-6) 

= 16+ 1 + 10+ 8- 6- 7- 3- 2 
+ 2+ 4+13+11- 9-10- 6- 6 
+ 7+ 9+ 1 + 16-14-15-11-10 
+ 4+ 6 + 16 + 13 — 11 — 12— 8- 7 
12 14 — 6 — 4 + 2 + 3+16+16 

- 1- 3-12-10+ 8+ 9+ 6+ 4 
-11-13- 5- 3+ 1+ 2+15+14 

- 5- 7-16-14 + 12+13+ 9+ 8 

= 2(16 + 1 + 8 + 2 + 4 + 13+ 15 +9 — 10 — 6— 7 — 3 — 11 — 5 — 14 
-12) 

= 2(Xi — x 2 ). 


32 


FAMOUS PROBLEMS. 


that is, 

Vx, 2 + 4 Vx 2 2 + 4 = 2 Vff. 

Hence Vx 2 2 -f- 4 can be expressed rationally in terms of the 
other two square roots. This equation shows that if two of 
the three differences y, — y 2 , y 4 — y 3 , x, — x 2 are positive, the 
same is true of the third, which agrees with the results ob- 
tained directly. 

Replacing now x 1; y„ y 4 by their numerical values, we 
obtain in succession 

1 + V17 
2 ' ’ 

1 + Vl7 + \/34 - 2 Vl7 

4 

1 - Vlr + V34 + 2 V17 

4 ’ 

1 -f Vl7 + \J 34 - 2 V 17 

8 

^68+12 V17-16 V34 -f 2 Vl7— 2'(1— Vl7) \Jm~24u 

+ g 

The algebraic part of the solution of our problem is now 
completed. We have already remarked that there is no known 
construction of the regular polygon of 17 sides based upon 
purely geometric considerations. There remains, then, only 
the geometric translation of the individual algebraic steps. 

7. We may be allowed to introduce here a brief historical 
account of geometric constructions with straight edge and 
compasses. 

In the geometry of the ancients the straight edge and com- 
passes were always used together ; the difficulty lay merely in 
bringing together the different parts of the figure so as not to 


Xl = 

Yi= 

l*= 

Zl — 


THE REGULAR POLYGON OF 17 SIDES. 


33 


draw any unnecessary lines. Whether the several steps in 
the construction were made with straight edge or with com- 
passes was a matter of indifference. 

On the contrary, in 1797, the Italian Mascheroni succeeded 
in effecting all these constructions with the compasses alone ; 
he set forth his methods in his Geometria del compasso, and 
claimed that constructions with compasses were practically 
more exact than those with the straight edge. As he ex- 
pressly stated, he wrote for mechanics, and therefore with a 
practical end in view. Mascheroni’s original work is difficult 
to read, and we are under obligations to Hutt for furnishing 
a brief resume in German, Die Mascheroni’ schen Constractionen 
(Halle, 1880). 

Soon after, the French, especially the disciples of Carnot, 
the author of the Geometrie de position, strove, on the other 
hand, to effect their constructions as far as possible with 
the straight edge. (See also Lambert, Freie Perspective, 
1774.) 

Here we may ask a question which algebra enables us to 
answer immediately : In what cases can the solutiou of an 
algebraic problem be constructed with the straight edge alone? 
The answer is not given with sufficient explicitness bj the 
authors mentioned. We shall say : 

With the straight edge alone we can construct all algebraic 
expressions whose form is rational. 

With a similar view Brianchon published in ISIS a paper, 
Les applications de la theorie des transversales, in which he shows 
how his constructions can be effected in many cases with the 
straight edge alone. He likewise insists upon the practical 
value of his methods, which are especially adapted to field 
work in surveying. 

Poncelet was the first, in his Traite des proprietes projectives 
(Vol. I, Nos. 351-357), to conceive the idea that it is sufficient 
to use a single fixed circle in connection with the straight lines 


34 


FAMOUS PROBLEMS. 


of the plane in order to construct all expressions depending 
upon square roots, the center of the fixed circle being given. 

This thought was developed by Steiner in 1833 in a cele- 
brated paper entitled Die geometrischen Constructionen, ausge- 
fiihrt mittels der geraden Linie und eines festen Kreises, als 
Lehrgegensta nd fur /where Unterrichtsanstalten und zum Selbst- 
unterricht. 

8. To construct the regular polygon of 17 sides we shall 
follow the method indicated by von Staudt (Crelle’s Journal, 
Yol. XXIV, 1842), modified later by Schroter (Crelle’s Jour- 
nal, Vol. LXXV, 1872). The construction of the regular 
polygon of 17 sides is made in accordance with the methods 
indicated by Poncelet and Steiner, inasmuch as besides the 
straight edge but one fixed circle is used.* 

First, we will show how with the straight edge and one fixed 
circle we can solve every quadratic equation. 

At the extremities of a diameter of the fixed unit circle 
(Fig. 4) we draw two tangents, and select the lower as the 

axis of X, and the diameter 
perpendicular to it as the 
axis of V. Then the equa- 
tion of the circle is 

x 2 + y (y — 2) — - 0. 

Let 

x 2 — px -f- q = 0 

be any quadratic equation 
with real roots Xi and x 2 . Required to construct the roots x, 
and x 2 upon the axis of X. 

Lay off upon the upper tangent from A to the right, a seg- 

4 

ment measured by - ; upon the axis of X from 0, a segment 

* A Mascheroni construction of the regular polygon of 17 sides by 
L Gerard is given in Math. Annalen, V 0 1. XLV1II, 1896, pp. 390-392. 



THE REGULAR POLYGON OF 17 SIDES. 


35 


measured by ^ ; connect the extremities of these segments by 

the line 3 and project the intersections of this line with the 
circle from A, by the lines 1 and 2, upon the axis of X. The 
segments thus cut off upon the axis of X are measured by X[ 
and x 2 . 

Proof. Calling the intercepts upon the axis of X, x x and x 2 , 
we have the equation of the line 1, 

2x+ Xl (y 2) = 0 ; 

of the line 2. 

2x + x. 2 (y - 2) = 0. 

If we multiply the first members of these two equations we 
get 

* 2 + X -^jp x (y -2) + *-f (y 2) 2 = 0 

as the equation of the line pair formed by 1 and 2. Subtract- 
ing from this the equation of the circle, we obtain 

x (y — 2) + — (y — 2) s — y (y — 21 = 0 


This is the equation of a conic passing through the four 
intersections of the lines 1 and 2 with the circle. From 
this equation we can remove the factor y — 2, correspond- 
ing to the tangent, and we have left 


Xl + x 2 , x,x 2 / 

— x + -r(y-2)-y = o, 


which is the equation of the line 3. If we now make 
x l + x 2 =p and x,x 2 — q, we get 

jjx + 3(y-2)-y = 0, 

and the transversal 3 cuts off from the line y = 2 the seg- 


36 


FAMOUS PROBLEMS. 


ment - , and from the line y = 0 the segment 3 . Thus the 
P P 

correctness of the construction is established. 

9. In accordance with the method just explained, we shall 
now construct the roots of our four quadratic equations. 
They are (see pp. 29-31) 

(£) x 2 + x — 4 = 0, with roots x 2 and x 2 ; x, > x„ 

(i i) y 2 — x^ — 1 = 0, with roots yj and y 2 ; y 2 > y 2 , 

(’?') y 2 — x,y — 1 = 0, with roots y 8 and y 4 ; y, > y„ 

(f) z 2 — yjZ y 4 = 0, with roots z x and z 2 ; z x > z 2 . 

These will furnish 

o 

z i = 2 cos — , 

whence it is easy to construct the polygon desired. We 
notice further that to construct z, it is sufficient to construct 
*i, x 2 , yi, y<- 

We then lay off the following segments : upon the upper 
tangent, y = 2, 

4 4 4 

4 ’ X ’ X ’ u ’ 

x i x 2 y i 

upon the axis of X, 

+4,-1, -1, n. 

Xj x 2 y! 

This may all be done in the following manner : The 
straight line connecting the point + 4 upon the axis of X 
with the point — 4 upon the tangent y — 2 cuts the circle in 



THE REGULAR POLYGON OF 17 SIDES. 


37 


two points, tlie projection of which from the point A (0, 2), 
the upper vertex of the circle, gives the two roots x„ x 2 of the 
first quadratic equation as intercepts upon the axis of X. 

To solve the second equation we have to lay off — above 

x i 

and below. 

*i 

To determine the first point we connect X! upon the axis of 
X with A, the upper vertex, and from 0, the lower vertex, 
draw another straight line through the* intersection of this 
line with the circle. This cuts off upon the upper tangent 

4 

the intercept — . This can easily be shown analytically. 

The equation of the line from A to x, (Fig. 5), 

2x + x,y = 2x„ 

and that of the circle, 

x 2 + y (y - 2) = 0, 

give as the coordinates of their intersection 
_4x, 2x, 2 

x, 2 + 4’ Xl 2 + 4- 

The equation of the line from 0 through this point becomes 



cutting off upon y = 2 the intercept — . 

x i 

We reach the same conclusion still more simply by the use 
of some elementary notions of projective geometry. By our 
construction we have obviously associated with every point x 
of the lower range one, and only one, point of the upper, so 
that to the point x = oo corresponds the point x' = 0, and con- 
versely. Since in such a correspondence there must exist a 


38 


FAMOUS PROBLEMS. 


linear relation, the abscissa x' of the upper point must satisfy 

the equation const. 

x' = 

X 

Since x' = 2 when x = 2, as is obvious from the figure, the 
constant = 4. 



To determine upon the axis of X we connect the point 

x i 

4 upon the upper with the point 1 upon the lower tan- 
gent (Fig. 6). The point thus determined upon the vertical 

diameter we connect with the point — above. This line 

x i 

cuts off upon the axis of X the intercept — — . For the 

X 1 

line from — 4 to + 1, 

5y + 2x = 2, 

intersects the vertical diameter in the point (0, jj). Hence 
the equation of the line from — to this point is 

x i 


5y — 2x,x = 2, 

and its intersection with the lower tangent gives — — 

The projection from A of the intersections of the line from 
14 . 

to — with the circle determines upon the axis of X the 

x i x j 

two roots of the second quadratic equation, of which, as 


THE REGULAR POLYGON OF 17 SIDES. 


39 


already noted, we need only the greater, y,. This corres- 
ponds, as shown by the figure, to the projection of the upper 
intersection of our transversal with the circle. 

Similarly, we obtain the roots of the third quadratic equa- 
tion. Upon the upper tangent we project from 0 the inter- 
section of the circle with the straight line which gave upon 
the axis of X the root + x a . This immediately gives the 

intercept — , by reason of the correspondence just explained. 



If we connect this point with the point where the vertical 
diameter intersects the line joining — 4 above and 1 below, 


we cut off upon the axis of X the segment — — , as desired. 

x a 

If we project that intersection of this transversal with the 
circle which lies in the positive quadrant from A upon the 
axis of X, we have constructed the required root y 4 of the third 
quadratic equation. 


We have finally to determine the root of the fourth quad- 

4 v 

ratio equation and for this purpose to lay off — above and ■'* 

yi yi 

below. We solve the first problem in the usual way, by pro- 
jecting the intersection of the circle with the line connecting 
A with -f- y, below, from 0 upon the upper tangent, thus 


obtaining—. For the other segment we connect the point 

y> 

-j- 4 above with y 4 below, and then the point thus determined 


40 


FAMOUS PROBLEMS. 


upon the vertical diameter produced with — . This line cuts 

off upon the axis of X exactly the segment desired, — . Tor 
the line a (Fig. 8) has the equation ^ 

(y< 4) y H' 2x = 2y,. 



It cuts off upon the vertical diameter the segment 
The equation of the line b is then 


y*- 4 ' 


2y>x + (y, - 4) y = 2y«, 

and its intersection with the axis of X has the abscissa — . 

y> 

If we project the upper intersection of the line b with the 

2b- 

circle from A upon the axis of X, we obtain z t — 2 cos — . 


If we desire the simple cosine itself we have only to draw a 
diameter parallel to the axis of X, on which our last projecting 

2ir 

ray cuts off directly cos . A perpendicular erected at this 


point gives immediately the first and sixteenth vertices of the 
regular polygon of 17 sides. 

The period z, was chosen arbitrarily ; we might construct 
in the same way every other period of two terms and so find 
the remaining cosines. These constructions, made on separate 
figures so as to be followed more easily, have been combined 
in a single figure (Fig. 9), which gives the complete construc- 
tion of the regular polygon of 17 sides. 


THE REGULAR POLYGON OF 17 SIDES. 


41 



Fig. 9. 


CHAPTER V. 


General Considerations on Algebraic Constructions. 

1. We shall now lay aside the matter of construction with 
straight edge and compasses. Before quitting the subject we 
may mention a new and very simple method of effecting cer- 
tain constructions, paper folding. Hermann Wiener * has 
shown how by paper folding we may obtain the network of 
the regular polyhedra. Singularly, about the same time a 
Hindu mathematician, Sundara Row, of Madras, published a 
little book, Geometrical Exercises in Paper Folding (Madras, 
Addison & Co., 1893), in which the same idea is consider- 
ably developed. The author shows how by paper folding we 
may construct by points such curves as the ellipse, cissoid, etc. 

2. Let us now inquire how to solve geometrically prob- 
lems whose analytic form is an equation of the third or of 
higher degree, and in particular, let us see how the ancients 
succeeded. The most natural method is by means of the 
conics, of which the ancients made much use. For example, 
they found that by means of these curves they were enabled 
to solve the problems of the duplication of the cube and the 
trisection of the angle. We shall in this place give only a 
general sketch of the process, making use of the language 
of modern mathematics for greater simplicity. 

Let it be required, for instance, to solve graphically the 

cubic equation .... 

H x*+ ax*+bx + c = 0, 

or the biquadratic, 

x 4 +ax’+ bx 2 + cx + d = 0. 

* See Dyck, Katalog der Miinchener matheinatischen Ausstellung von 
1893, Nachtrag, p. 52. 


ALGEBRAIC CONSTRUCTIONS. 


43 


Put x* = y ; our equations become 

xy + ay + bx -(-c = 0 

and y* axy 4 by 4 cx 4 d = 0. 

The roots of the equations proposed are thus the abscissas 
of the points of intersection of the two conics. 

The equation 

x J = y 

represents a parabola with axis vertical. The second equa- 
tion, 

xy 4 ay 4 bx 4 c = 0, 

represents an hyperbola whose asymptotes are parallel to the 
axes of reference (Fig. 10). One of the four points of inter- 




section is at infinity upon the axis of Y, the other three at a 
finite distance, and their abscissas are the roots of the equa- 
tion of the third degree. 

In the second case the parabola is the same. The hyper- 
bola (Fig. 11) has again one asymptote parallel to the axis of 
X while the other is no longer perpendicular to this axis. 
The curves now have four points of intersection at a finite 
distance. 


R 


44 FAMOUS PROBLEMS. 

The methods of the ancient mathematicians are given in 
detail in the elaborate work of M. Cantor, Geschichte der 
Mathematik (Leipzig, 1894, 2d ed.). Especially interesting is 
Zeuthen, Die Kegelschnitte im Altertum (Kopenhagen, 1886, 
in German edition). As a general compendium we may men- 
tion Baltzer, Analytische Geometric (Leipzig, 1882). 

3. Beside the conics, the ancients used for the solution of 
the above-mentioned problems, higher 
curves constructed for this very pur- 
pose. We shall mention here only 
the Cissoid and the Conchoid. 

The cissoid of Diodes (c. 150 B.c.) 
may be constructed as follows (Fig. 
12) : To a circle draw a tangent (in the 
figure the vertical tangent on the right) 
and the diameter perpendicular to it. 
Draw lines from 0, the vertex of the 
circle thus determined, to points upon 
the tangent, and lay off from 0 upon 
each the segment lying between its 
intersection with the circle and the 
tangent. The locus of points so deter- 
mined is the cissoid. 

To derive the equation, let r be the 
radius vector, 6 the angle it makes with 
the axis of X. If we produce r to the 
tangent on the right, and call the diameter of the circle 1, 

the total segment equals . The portion cut off by the 

circle is cos 6. The difference of the two segments is r. and 
hence 

1 . sin 5 0 

-cos 6— 

cos 6 cos 6 



r = 


ALGEBRAIC CONSTRUCTIONS. 


45 


By transformation of coordinates we obtain the Cartesian 
equation, 

(x 2 + y 2 ) x -. y 2 = 0. 

The curve is of the third order, has a cusp at the origin, 
and is symmetric to the axis of X. The vertical tangent to 
the circle with which we began our construction is an asymp- 
tote. Finally the cissoid cuts the line at infinity in the cir- 
cular points. 

To show, how to solve the Delian problem by the use of 
this curve, we write its equation in the following form : 



We now construct the straight line, 

* = X. 

X 

This cuts off upon the tangent x = 1 the segment X, and 
intersects the cissoid in a point for which 



This is the equation of a straight line passing through the 
point y = 0, x = 1, and hence of the line joining this point 
to the point of the cissoid. 

This line cuts off upon the axis of Y the intercept X 3 . 

We now see how may be constructed. Lay off upon 
the axis of Y the intercept 2, join this point to the point 
x — 1, y = 0, and through its intersection with the cissoid 
draw a line from the origin to the tangent x = 1. The inter- 
cept on this tangent equals T^2. 

4 . The conchoid of Nicomedes ( c . 150 B.c.) is constructed 
as follows : Let 0 be a fixed point, a its distance from a fixed 


46 


FAMOUS PROBLEMS. 


line. If we pass a pencil of rays through 0 and lay off on 
each ray from its intersection with the fixed line in both 
directions a segment b, the locus of the points so determined 
is the conchoid. According as b is greater or less than a, 

the origin is a node or a con- 
jugate point ; for b = a it is 
a cusp (Fig. 13). 

Taking for axes of X and Y 
the perpendicular and paral- 
lel through 0 to the fixed 
line, we have 



whence 

(x ! + y*)(x-a) a -b 2 x J = 0. 

The conchoid is then of the 
fourth order, has a double 
point at the origin, and is 
composed of two branches 
having for common asymptote 
the line x = a. Further, the 
factor (x ! + y 2 ) shows that the 
curve passes through the cir- 
cular points at infinity, a mat- 
ter of immediate importance. 

We may trisect any angle by means of this curve in the 
following manner : Let </>= MOY (Fig. 13) be the angle to 
be divided into three equal parts. On the side OM lay off 
OM = b, an arbitrary length. With M as a center and radius 
b describe a circle, and through M perpendicular to the axis 
of X with origin 0 draw a vertical line representing the 
asymptote of the conchoid to be constructed. Construct the 


fig. 13. 


ALGEBRAIC CONSTRUCTIONS. 


47 


conchoid. Connect 0 with A, the intersection of the circle 
and the conchoid. Then is Z AOY one third of Z <A, as is 
easily seen from the figure. 

Our previous investigations have shown us that the prob- 
lem of the trisection of the angle is a problem of the third 
degree. It admits the three solutions 

<A <f> + 2 7T <f> -)- 4 7T 

3’ 3 ’ 3 

Every algebraic construction which solves this problem by 
the aid of a curve of higher degree must obviously furnish all 
the solutions. Otherwise the equation of the problem would 
not be irreducible. These different solutions are shown in 
the figure. The circle and the conchoid intersect in eight 
points. Two of them coincide with the origin, two others 
with the circular points at infinity. None of these can give 
a solution of the problem. There remain, then, four points 
of intersection, so that we seem to have one too many. This 
is due to the fact that among the four points we necessarily 
find the point B such that OM B = 2 b, a point which may be 
determined without the aid of the curve. There actually 
remain then only three points corresponding to the three 
roots furnished by the algebraic solution. 

5. In all these constructions with the aid of higher alge- 
braic curves, we must consider the practical execution. We 
need an instrument which shall trace the curve by a con- 
tinuous movement, for a construction by points is simply a 
method of approximation. Several instruments of this sort 
have been constructed ; some were known to the ancients. 
Xicomedes invented a simple device for tracing the conchoid. 
It is the oldest of the kind besides the straight edge and 
compasses. (Cantor, I, p. 302.) A list of instruments of 
more recent construction may be found in Dvck’s Katalog, 
pp. 227-230. 340. and Nachtrag, pp. 42, 43. 





PART II. 


TRANSCENDENTAL NUMBERS AND THE QUADBATUEE OF THE 

CIBCLE. 


CHAPTER I. 

Cantor's Demonstration of the Existence of 
Transcendental Numbers. 

1. Let us represent numbers as usual by points upon the 
axis of abscissas. If we restrict ourselves to rational numbers 
the corresponding points will fill the axis of abscissas densely 
throughout ( iiberall dicht), i.e., in any interval no matter how- 
small there is an infinite number of such points. Neverthe- 
less, as the ancients had already discovered, the continuum 
of points upon the axis is not exhausted in this way ; between 
the rational numbers come in the irrational numbers, and the 
question arises whether there are not distinctions to be made 
among the irrational numbers. 

Let us define first what we mean by algebraic numbers. 
Every root of an algebraic equation 

a u <u" + a,w n ~‘ + ■ • -f- a n _,a> + a n = 0 

with integral coefficients is called an algebraic number. Of 
course we consider only the real roots. Rational numbers 
occur as a special case in equations of the form 

a 0 w -f- aj = 0. 


50 


FAMOUS PROBLEMS. 


We now ask the question: Does the totality of real 
algebraic numbers form a continuum, or a discrete series 
such that other numbers may be inserted in the intervals ? 
These new numbers, the so-called transcendental numbers, 
would then be characterized by this property, that they cannot 
be roots of an algebraic equation with integral coefficients. 

This question was answered first by Liouville ( Comptes 
rendus, 1844, and Liouville’s Journal, Vol. XVI, 1851), and 
in fact the existence of transcendental numbers was demon- 
strated by him. But his demonstration, which rests upon the 
theory of continued fractions, is rather complicated. The 
investigation is notably simplified by using the developments 
given by Georg Cantor in a memoir of fundamental impor- 
tance, Ueber cine Eigenschaft des Inbegriffes reeller algebra- 
ischer Zahlen (Crelle’s Journal, Vol. LXXVII, 1873). We 
shall give his demonstration, making use of a more simple 
notion which Cantor, under a different form, it is true, sug- 
gested at the meeting of naturalists in Halle, 1891. 

2. The demonstration rests upon the fact that algebraic 
numbers form a countable mass, while transcendental numbers 
do not. By this Cantor means that the former can be arranged 
in a certain order so that each of them occupies a definite 
place, is numbered, so to speak. This proposition may be 
stated as follows : 

The manifoldness of real algebraic numbers and the mani- 
foldness of positive integers can be brought into a one-to-one 
correspondence. 

We seem here to meet a contradiction. The positive inte- 
gers form only a portion of the algebraic numbers ; since 
each number of the first can be associated with one and one 
only of the second, the part would be equal to the whole. 
This objection rests upon a false analogy. The proposition 
that the part is always less than the whole is not true for 


TRANSCENDENTAL NUMBERS. 


51 


infinite masses. It is evident, for example, that we may 
establish a one-to-one correspondence between the aggregate 
of positive integers and the aggregate of positive even num- 
bers, thus: 

0 12 3- n ■ • • 

0 2 4 6 • • • 2 n • • • • 

In dealing with infinite masses, the words great and small are 
inappropriate. As a substitute, Cantor has introduced the 
word power ( Mdchtigkeit ), and says : Two infinite masses have 
the same power when they can be brought into a one-to-one cor- 
respondence with each other. The theorem which we have to 
prove then takes the following form : The aggregrate of real 
algebraic numbers has the same power as the aggregate of 
positive integers. 

We obtain the aggregate of real algebraic numbers by seek- 
ing the real roots of all algebraic equations of the form 

a 0 w" + ai(u n_I -(- • • -f- a„_,<u -f- a„ = 0 ; 

all the a’s are supposed prime to one another, a 0 positive, 
and the equation irreducible. To arrange the numbers thus 
obtained in a definite order, we consider their height N as 
defined by 


N ~ n — 1 + I a 0 | + I a, | + • • • + j a 0 | , 

|a,| representing the absolute value of a,, as usual. To a 
given number N corresponds a finite number of algebraic 
equations. For, N being given, the number n has certainly 
an upper limit, since N is equal to n — 1 increased by positive 
numbers; moreover, the difference N — (n — 1) is a sum of 
positive numbers prime to one another, whose number is 
obviously finite. 


52 


FAMOUS PROBLEMS. 


N 

n 

l*ol 

1*.! 

l*tl 

l«»l 

M 

Equatioh. 

*(N) 

Roots. 

1 

i 

1 

0 




x = 0 

1 

0 


2 

0 

0 

0 



— 



2 

1 

2 

0 




_ 

2 

- 1 



1 

1 




x ± 1 =0 


+ 1 


2 

1 

0 

0 



— 



3 

1 

3 

0 




- 

4 

-2 



2 

1 




2x± 1 =0 


1 

2 



1 

2 




x ± 2 =0 


+ i j 


2 

2 

0 

0 



— 


+ 2 



1 

1 

0 



- 





1 

0 

1 



— 




3 

1 

0 

0 

0 


— 



4 

1 

4 

0 




— 

12 

-3 



3 

1 




3x± 1 =0 


- 1.61803 



2 

2 




- 


- 1.41421 



i 

3 




x ± 3 = 0 


-0.70711 


2 

3 

0 

0 



— 


- 0.61803 



2 

1 

0 



— 


- 0.33333 



2 

0 

1 



2 z 2 ~ 1 = 0 


+ 0.33333 



1 

2 

0 



- 


+ 0.61803 



1 

1 

1 



x 2 ± x - 1 = 0 


4- 0.70711 



1 

0 

2 



*o 

I 

K) 

II 

O 


+ 1.41421 


3 

2 

0 

0 

0 


- 


+ 1.61803 



1 

1 

0 

0 


- 


+ 3 



1 

0 

1 

0 


— 





1 

0 

0 

1 


— 



| 4 

1 

0 

0 

0 

0 

— 




Among these equations we must discard those that are 
reducible, which presents no theoretical difficulty. Since 
the number of equations corresponding to a given value of 


TRANSCENDENTAL NUMBERS. 


53 


N is limited, there corresponds to a determinate N only a 
finite mass of algebraic numbers. We shall designate this 
by <£(N). The table contains the values of <£(1), <f>(2), 

<£(4), and the corresponding algebraic numbers w. 

We arrange now the algebraic numbers according to their 
height, N, and the numbers corresponding to a single value of 
N in increasing magnitude. We thus obtain all the algebraic 
numbers, each in a determinate place. This is done in the 
last column of the accompanying table. It is. therefore, 
evident that algebraic numbers can be counted. 

3. We now state the general proposition : 

In any portion of the axis of abscissas, however small, there 
is an infinite number oj points which certainly do not belong to 
a given countable mass. 

Or, in other words : 

The continuum of numerical values represented by a portion 
of the axis of abscissas, however small, has a greater power 
than any given countable mass. 

This amounts to affirming the existence of transcendental 
numbers. It is sufficient to take as the countable mass the 
aggregate of algebraic numbers. 

To demonstrate this theorem we prepare a table of algebraic 
numbers as before and write in it all the numbers in the form 
of decimal fractions. None of these will end in an infinite 
series of 9’s. For the equality 

1 =0.999 • ■ - 9 

shows that such a number is an exact decimal. If now we 
can construct a decimal fraction which is not found in our 
table and does not end in an infinite scries of 9’s it will 
certainly be a transcendental number. By means of a very 
simple process indicated by Georg Cantor we can find not 
only one but infinitely many transcendental numbers, even 


54 


FAMOUS PROBLEMS. 


when the domain in which the number is to lie is very small 
Suppose, for example, that the first five decimals of the num- 
ber are given. Cantor’s process is as follows. 

Take for 6th decimal a number different from 9 and from 
the 6th decimal of the first algebraic number, for 7th decimal 
a number different from 9 and from the 7th decimal of the 
second alyebraic number, etc. In this way we obtain a decimal 
fraction which will not end in an infinite series of 9's and is 
certainly not contained in our table. The proposition is then 
demonstrated. 

We see by this that (if the expression is allowable) there 
are far more transcendental numbers than algebraic. For 
when we determine the unknown decimals, avoiding the 9’s, 
we have a choice among eight different numbers ; we can 
thus form, so to speak, 8" transcendental numbers, even when 
the domain in which they are to lie is as small as we please. 


CHAPTER II. 


Historical Survey of the Attempts at the Computation 
and Construction of ir. 

In the next chapter we shall prove that the number 7 r 
belongs to the class of transcendental numbers whose exis- 
tence was shown in the preceding chapter. The proof was 
first given by Lindemann in 1882, and thus a problem was 
definitely settled which, so far as our knowledge goes, has 
occupied the attention of mathematicians for nearly 4000 
years, the problem of the quadrature of the circle. 

For, if the number it is not algebraic, it certainly cannot 
be constructed by means of straight edge and compasses. 
The quadrature of the circle in the sense understood by the 
ancients is then impossible. It is extremely interesting to 
follow the fortunes of this problem in the various epochs of 
science, as ever new attempts were made to find a solution 
with straight edge and compasses, and to see how these neces- 
sarily fruitless efforts worked for advancement in the mani- 
fold realm of mathematics. 

The follov'ing brief historical survey is based upon the 
excellent work of Rudio : Archimedes, Huygens, Lambert, 
Legendre, Vier Abhandlungen iiber die Kreismessung, Leipzig, 
1892. This book contains a German translation of the 
investigations of the authors named. While the mode of 
presentation does not touch upon the modern methods here 
discussed, the book includes many interesting details which 
are of practical value in elementary teaching. 


56 


FAMOUS PROBLEMS. 


1. Among the attempts to determine the ratio of the 
diameter to the circumference we may first distinguish the 
empirical stage, in which the desired end was to be attained by 
measurement or by direct estimation. 

One of the oldest known mathematical documents, the Rbind 
Papyrus (c. 1650 b.c.), contains the problem in the 
well-known form, to transform a circle into a square of 
equal area. The writer of the papyrus lays down the 
following rule : Cut off J of a diameter and construct a 
square upon the remainder ; this has the same area as the 
circle. The value of t t thus obtained is ( 1 ■$-)'* — 3.16 ■ • •, not 
very inaccurate. Much less accurate is the value 7r = 3, 
used in the Bible (1 Kings, 7. 23, 2 Chronicles, 4. 2). 

2. The Greeks rose above this empirical standpoint, and 
especially Archimedes, who, in his work kvkXov pirpr/a-is, com- 
puted the area of the circle by the aid of inscribed and cir- 
cumscribed polygons, as is still done in the schools. His 
method remained in use till the invention of the differential 
calculus ; it was especially developed and rendered practical 
by Huygens (d. 1654) in his work, De circuli magnitudine 
inventa. 

As in the case of the duplication of the cube and the 
trisection of the angle the Greeks sought also to effect the 
quadrature of the circle by the help of higher curves. 

Consider for example the curve y = sin -1 x, which repre- 
sents the sinusoid with axis vertical. Geometrically, it 
appears as a particular ordinate of this curve ; from the 
standpoint of the theory of functions, as a particular value of 
our transcendental function. Any apparatus which describes 
a transcendental curve we shall call a transcendental appara- 
tus. A transcendental apparatus which traces the sinusoid 
gives us a geometric construction of tr. 

In modern language the curve y = sin~‘x is called an 


THE CONSTRUCTION OF it. 


57 


integral curve because it can be defined by means of the 
integral of an algebraic function, 


/ dx 

vr=r?' 



B 

/ M 


~~7V\ 


/o ) 

1 O 



Flo. 14. 


The ancients called such a curve a quadratrix or rerpayiuvt- 
{ovtra. The best known is the quadratrix of Dinostratus 
(c. 350 b.c.) which, however, had al- 
ready been constructed by Hippias of 
Elis (c. 420 b.c.) for the trisection of 
an angle. Geometrically it may be 
defined as follows. Having given a 
circle and two perpendicular radii OA 
and OB, two points M and L move with 
constant velocity, one upon the radius 
OB, the other upon the arc AB (Fig. 

14). Starting at the same time at 0 
and A, they arrive simultaneously at B. The point of inter- 
section P of OL and the parallel to OA through M describes 
the quadratrix. 

From this definition it follows that y is proportion'll to 0. 

7T 

Further, since for y = 1, 6 = — we have 

It 

y 

and from 0 = tan -1 ^ the equation of the curve becomes 

y * w 

= tan-y. 
x 2 1 

It meets the axis of X at the point whose abscissa is 

x - lim — - — , for v = 0 ; 
tan -y 


58 


FAMOUS PROBLEMS. 


hence 


2 



According to this formula the radius of the circle is the 
mean proportional between the length of the quadrant and 
the abscissa of the intersection of the quadratrix with the 
axis of X. This curve can therefore be used for the rectifica- 
tion and hence also for the quadrature of the circle. This 
use of the quadratrix amounts, however, simply to a geo- 
metric formulation of the problem of rectification so long as 
we have no apparatus for describing the curve by continuous 
movement. 

Fig. 15 gives an idea of the form of the curve with the 
branches obtained by taking values of 6 greater than tt or 





Fio. 15. 


less than — v. Evidently the quadratrix of Dinostratus is 
not so convenient as the curve y = sin -1 x, but it does not 
appear that the latter was used by the ancients. 

3. The period from 1670 to 1770, characterized by the 
names of Leibnitz, Newton, and Euler, saw the rise of modern 
analysis. Great discoveries followed one another in such an 
almost unbroken series that, as was natural, critical rigor fell 
into the background. For our purposes the development 


THE COMPUTATION OF n. 


59 


of the theory of series is especially important. Numerous 
methods were deduced for approximating the value of rr. It 
will suffice to mention the so-called Leibnitz series (known, 
however, before Leibnitz) : 


This same period brings the discovery of the mutual depend- 
ence of e and ir. The number e, natural logarithms, and 
hence the exponential function, are first found in principle in 
the works of Napier (1614). This number seemed at first to 
have no relation whatever to the circular functions and the 
number ir until Euler had the courage to make use of imagi- 
nary exponents. In this way he arrived at the celebrated 
formula 

e 11 = cos x i sin x, 
which, for x — ir, becomes 

e 1 ' = — 1. 

This formula is certainly one of the most remarkable in all 
mathematics. The modern proofs of the transcendence of i r 
are all based on it, since the first step is always to show the 
transcendence of e. 

4. After 1770 critical rigor gradually began to resume its 
rightful place. In this year appeared the work of Lambert : 
Vorldufige Kenntnisse fur die so die Quadratur des Cirkuls 
suchen. Among other matters the irrationality of 7 r is dis- 
cussed. In 1794 Legendre, in his Elements de yeometrie, 
showed conclusively that ir and ir 2 are irrational numbers. 

6. But a whole century elapsed before the question was 
investigated from the modern point of view. The starting- 
point was the work of Hermite : Sur la function exponenticlle 
( Comptes rendus, 1873, published separately in 1874). The 
transcendence of e is here proved. 


60 


FAMOUS PROBLEMS. 


An analogous proof for 7 r, closely related to that of 
Hermite, was given by Lindemann : Ueber die Zahl ir 
( Mathemiuische Annalen, XX, 1882. See also the Proceed- 
ings of the Berlin and Paris academies). 

The question was then settled for the first time, but the 
investigations of Hermite and Lindemann were still very 
complicated. 

The first simplification was given by Weierstrass in the 
Berliner Berichte of 1885. The works previously mentioned 
were embodied by Bachmann in his text-book, Vorlesumjen 
tiber die Natur der Irrationalzahlen, 1892. 

But the spring of 1893 brought new and very important 
simplifications. In the first rank should be named the 
memoirs of Hilbert in the Gottinger Nachrichten. Still 
Hilbert’s proof is not absolutely elementary : there remain 
traces of Hermite’s reasoning in the use of the integral 

/z p e~*dz = p ! 

0 

But Hurwitz and Gordan soon showed that this transcen- 
dental formula could be done away with ( Gottinger Nach- 
richten ; Comptes rendus ; all three papers are reproduced 
with some extensions in Mathematische Annalen, Vol. XLIII). 

The demonstration has now taken a form so elementary 
that it seems generally available. In substance we shall 
follow Gordan’s mode of treatment. 


CHAPTER III. 


The Transcendence of the Number e. 


1. We take as the starting-point for our investigation the 
well-known series 


e* = l + *+f; + 


n! 


+ ■ 


which is convergent for all finite values of x. The difference 
between practical and theoretical convergence should here be 
insisted on. Thus, for x = 1000 the calculation of e 1<wo by 
means of this series would obviously not be feasible. Still 
the series certainly converges theoretically ; for we easily 
see that after the 1000th term the factorial n ! in the 
denominator increases more rapidly than the power which 

x n 

occurs in the numerator. This circumstance that — has for 

n! 

any finite value of x the limit zero when n becomes infinite 
has an important bearing upon our later demonstrations. 

We now propose to establish the following proposition: 

The number e is not an algebraic number, i.e., an equation 
with integral coefficients of the form 


F (e) — C 0 + C,e + C 2 e* + • • ■ + C n e" = 0 

is impossible. The coefficients C, may be supposed prime to 
one another. 

We shall use the indirect method of demonstration, show- 
ing that the assumption of the above equation leads to an 
absurdity. The absurdity may be shown in the following 


62 


FAMOUS PROBLEMS. 


way. We multiply the members of the equation F(e) = 0 by 
a certain integer M so that 

MF(e)= MC 0 + MC ie + MC 2 e 2 + - • •+MC„e” = 0. 

We shall show that the number M can be chosen so that 

(1) Each of the products Me, Me ! , • • • Me” may be sepa- 
rated into an entire part M„ and a fractional part and our 
equation takes the form 

MF(e) = MC 0 + M,Ci + M 2 C 2 + • • • + M„C n 

+ Ci<! + C 2 £j + • • + C D t n = 0 ; 

(2) The integral part 

MC 0 +MiCi + - • • + M„C„ 

is not zero. This will result from the fact that when divided 
by a prime number it gives a remainder different from zero ; 

(3) The expression 

+ C 2 t a + ■ • • + C„«„ 

can be made as small a fraction as we please. 

These conditions being fulfilled, the equation assumed is 
manifestly impossible, since the sum of an integer different 
from zero, and a proper fraction, cannot equal zero. 

The salient point of the proof may be stated, though not 
quite accurately, as follows : 

With an exceedingly small error we may assume e, e 2 , • • • e" 
proportional to integers which certainly do not satisfy our 
assumed equation. 

2. We shall make use in our proof of a symbol h r and a 
certain polynomial <£(x). 

The symbol h r is simply another notation for the factorial r ! 
Thus, we shall write the series for e* in the form 


TRANSCENDENCE OF THE NUMBER e. 


63 


The symbol has no deeper meaning ; it simply enables us to 
write in more compact form every formula containing power's 
and factorials. 

Suppose, e.g., we have given a developed polynomial 
f 0) = 2 c r x r . 

r 

We represent by f(h), and write under the form 2 c r h r , the 
sum r 

c,-1 + C .-2! + c,- 3!+- • • + c„ • n : 

But if f(x) is not developed, then to calculate f(h) is to 
develop this polynomial in powers of h and finally replace 
h r by r !. Thus, for example, 

f(k + h) = 2 c r (k + h) r = % c' r • h r = V c' r • r!, 

r r r 

the c' r depending on k. 

The polynomial <£(x) which we need for our proof is the 
following remarkable expression 

W (P-1)I 

where p is a prime number, n the degree of the algebraic 
equation assumed to be satisfied by e. We shall suppose p 
greater than n and |C„|, and later we shall make it increase 
without limit. 

To get a geometric picture of this polynomial <£(x) we con- 
struct the curve 

y = <K X )- 

At the points x = 1, 2, • • • n the curve has the axis of X as 
an inflexional tangent, since it meets it in an odd number of 
points, while at the origin the axis of X is tangent without 
inflexion. For values of x between 0 and n the curve remains 
in the neighborhood of the axis of X ; for greater valnes of x 
it recedes indefinitely. 


64 


FAMOUS PROBLEMS. 


Of the function <£(x) we will now establish three important 
properties : 

1. \ 0 ing supposed given and p increasing without limit , 
<t> (x) tenos toward zero, as does also the sum of the absolute 
values oj its terms. 

Put i = x(l — x)(2 — x) • • • (n — x) ; we may then write 


<H X ) : 




IP - x 

.vhich for p infinite tends toward zero. 

To have the sum of the absolute values of <j> (x) it is suffi- 
cient to replace — x by |x| in the undeveloped form of <£(x). 
The second part is then demonstrated like the first. 

2. h being an integer, <f> (h) is an integer not divisible by p 
and therefore different from zero. 

Develop <£(x) in increasing powers of x, noticing that the 
terms of lowest and highest degree respectively are of degree 
p — 1 and np -f- p — 1. We have 


r=np+p— 1 

<H*) = X c '* r 

r= P — 1 

Hence 


c’x p -’ 

(P-1)! 


C " X P x np+p-l 


r=np+p— 1 

c r h r . 


r=p-l 


Lea /ing out of account the denominator (p — 1)!, which 
occurs in all the terms, the coefficients c r are integers. This 
deno minator disappears as soon as we replace h r by r !, since 
the factorial of least degree is h p_1 = (p — 1) !. All the terms 
of the development after the first will contain the factor p. 
As to the first, it may be written 

( 1-2-3- • ■n)--(p-l) ! 

(P-1)! ( 

and is certainly not divisible by p since p > n. 

Therefore <^>(h) = (n!) p (mod. p), 

and hence 


TRANSCENDENCE OF THE NUMBER e. 


65 


Moreover, <f> (h) is a very large number ; even its last term 
alone is very large, viz.: 

^(^-1)!^ = P(P + 1 )' ' 'OP + P- 1 )- 

3. h being an integer , and k one of the numbers 1, 2 • • • n, 
<f> (h -(- k) is an integer divisible by p. 

We have <f>(h + k)= £c r (h + k) f =;£c' r h r , 

r r 

a formula in which we are to replace h r by r ! only after hav- 
ing arranged the development in increasing powers of h. 

According to the rules of the symbolic calculus, we have 
first 

<t> (h k) 

= (h 4- kl p-1 [(l~~k~~h)(2~~k~ h) • • • (— h) • ■ • (n — k — h)~| p 

(P-1)! 

One of the factors in the brackets reduces to — h ; hence the 
term of lowest degree in h in the development is of degree p. 
We may then write 

r=np-r-p- 1 

</>(h + k)= 2c' r h'. 

r=p 

The coefficients still have for numerators integers and for 
denominator (p — 1)!. As already explained, this denomi- 
nator disappears when we replace h r by r!. But now all the 
terms of the development are divisible by p; for the first, 
may be written 

(— l) kp ■ k p -’ [~(k — l)!(n — k)!~l p • p! 

(p- 1;! 

= (~ 1) kp k p_1 [(k - 1) ! ■ ( n - k) !] p • p. 
<f> (h -f- k) is then divisible by p. 

3. We can now show that the equation 

F (e') = C, + C,e + C 2 e 2 + • ■ +C n e n = 0 
is impossible. 


66 


FAMOUS PROBLEMS. 


For the number M, by which we multiply the members of 
this equation, we select $ (h), so that 

d> (h) F (e) = C „<£ (h)-|- Ci4> (h) e + C 2 <£ (h) e J + ■ • ■ 4- C n <^> (h)e°. 


Let us try to decompose any term, such as C k <£(h)e\ into an 
integer and a fraction. We have 

e k ' <#> (h) = e k V c r h r . 


Considering the series development of e k , any term of this 
sum, omitting the constant coefficient, has the form 


h r = h r 


h' • k , h f ■ k 2 
2 ! 


1 


+ ■ 


+ 


k r h r ■ k r+1 

f I 


(' + !)! 


Replacing h r by r !, or what amounts to the same thing, by one 
of the quantities 

rh 1 ^', r(r — 1) h r ~ 2 • • •, r(r — 1) • • • 3 • h 2 , r(r — 1) • • ■ 2 ■ h, 


and simplifying the successive fractions, 
e k • h r = h r + j • h-’k + r( ^~ ] - h r ~ 2 k 2 + ■ • ■ + hk r -‘ + k r 
k k 2 ~l 

L r +1 > + l)(r + 2)^ J 


The first line has the same form as the development of 
(h + k) r : in the parenthesis of the second line we have the 
series 

0 + ,Tt + (r'+l)(r + 2) + ' ‘ 

whose terms are respectively less than those of the series 


k' ; k 3 

B k =1 + k+- + - + 


The second line in the expansion of e k • h r may therefore be 
represented by an expression of the form 

q,.k • e k • k r , 

q rk being a proper fraction. 


TRANSCENDENCE OF THE NUMBER e. 


67 


Effecting the same decomposition for each term of the sum 
e k 2 c r h r 

it takes the form 

e k 2 c,h r = X c, (h + k) r + e k 2 q rk c r k'. 

r r r 

The first part of this sum is simply *(h + k); this is a 
number divisible by p (2, 3). Further (2, 1), 

4>(k) = 2| Cr k r | 

tends toward zero when p becomes infinite : the same is true 
a fortiori of 2q r>k c r k r , and also, since e k is a finite quantity, 

of e k Vq r k c r k r , which we may represent by e k . 

The term under consideration, C k e k * (h), has then been put 
under the form of an integer C k *(h -f- k) and a quantity C k t k 
which, by a suitable choice of p, may be made as small as we 
please. 

Proceeding similarly with all the terms, we get finally 
F(e) * (h) = C 0 * (h) + Cj* (h + 1) -f ‘ • • + C „* (h + n) 

"F Cjti + C 2 e 2 + • • ■ + C„t„. 

It is now easy to complete the demonstration. All the 
terms of the first line after the first are divisible by p ; for 
the first, | C 0 1 is less than p ; * (h) is not divisible by p; hence 
Co*(h) is not divisible by the prime number p. Consequently 
the sum of the numbers of the first line is not zero. 

The numbers of the second line are finite in number ; each 
of them can be made smaller than any given number by a 
suitable choice of p ; and therefore the same is true of their 
sum. 

Since an integer not zero and a fraction cannot have zero 
for a sum, the assumed equation is impossible. 

Thus, the transcendence of e, or Hermite’s Theorem, is 
demonstrated. 


CHAPTER IV. 


The Transcendence of the Number it. 

1. The demonstration of the transcendence of the number 
7 r given by Lindemann is an extension of Hermite’s proof in 
the case of e. While Hermite shows that an integral equa- 
tion of the form 

Co + C,e + C a e 2 + • ■ • -f- C n e” = 0 
cannot exist, Lindemann generalizes this by introducing in 
place of the powers e, e 2 • • • sums of the form 

e kl + e k * + • • ■ + e 1 " 
e' 1 + e' 2 + • • ■ + e‘« 


where the k’s are associated algebraic numbers, i.e., roots of 
an algebraic equation, with integral coefficients, of the degree 
N ; the I’s roots of an equation of degree N etc. Moreover, 
some or all of these roots may be imaginary. 

Lindemann’s general theorem may be stated as follows : 

The number e cannot satisfy an equation of the form 
(1) C 0 +C 1 (e kl + e k2 +- • - + e k ") 

+ C s (e h + e 1 ’ + • • • + e’")+- • =0 
where the coefficients C ( are integers and the exponents k„ l„ • • • 
are respectively associated algebraic numbers. 

The theorem may also be stated : 

The number e is not only not an algebraic number and there- 
fore a transcendental number simply, but it is also not an 
interscendental * number and therefore a transcendental number 
of higher order. 

* Leibnitz calls a function x*, where X is an algebraic irrational, an 
interscendental function. 


TRANSCENDENCE OF THE NUMBER 


71 


69 


Let 

ax* + aix" -1 -f- • • --f-a K = 0 
be the equation having for roots the exponents k, ; 

bx* -f- bix*'- 1 + • • • -f b„, = 0 

that having for roots the exponents l„ etc. These equations 
are not necessarily irreducible, nor the coefficients of the first 
terms equal to 1. It follows that the symmetric functions of 
the roots which alone occur in our later developments need 
not be integers. 

In order to obtain integral numbers it will be sufficient to 
consider symmetric functions of the quantities 
ak„ ak 2 , • • • ak„ 
bh, bl„ • ■ ■ bl„., etc. 

These numbers are roots of the equations 

y* + a.y*-' + a 2 ay"- 2 + •••-(- a s a*-' = 0, 
y" + thy"'- 1 + b 2 by"'- 2 -f • • ■ + b„.b"’ -1 = 0, etc. 

These quantities are integral associated algebraic numbers, 
and their rational symmetric functions real integers. 

We shall now follow the same course as in the demonstra- 
tion of Hermite’s theorem. 

We assume equation (1) to be true ; we multiply both 
members by an integer M ; and we decompose each sum, 
such as 

M(e ll + e k2 +- • --fe 1 "), 
into an integral part and a fraction, thus 

M(e k > +e k2 +- • • + e k *)=M 1 + tl , 

M (e ,1 + e' 3 + - ■ •+e , ")=M 2 + £l , 

Our equation then becomes 

C 0 M + QMj -f- C 2 M 2 -j- • • ■ 

+ Qt! + C 2 t 2 + • • • — 0. 


70 


FAMOUS PROBLEMS. 


We shall show that with a suitable choice of M the sum of 
the quantities in the first line represents an integer not 
divisible by a certain prime number p, and consequently 
different from zero ; that the fractional part can be made as 
small as we please, and thus we come upon the same contra- 
diction as before. 

2. We shall again use the symbol h r = r! and select as 
the multiplier the quantity M = </r(h), where <p(x) is a gene- 
ralization of <£(x) used in the preceding chapter, formed as 
follows : 

X P ^ 

^( x ) = (j^Zi)?[( k i- x )( k 2 -><)- ' ' (k« — x )] p - a Np • a" p -a N " p ■ ■ • 
*[(li— *)(!*- *)• ' ’ (!«■ — x )] p ' b" p • b'* ,p • b I, ' p • • ■ 


where p is a prime number greater than the absolute value of 
each of the numbers 

Co, a, b, • ■ •, a„, b N ., 

and later will be assumed to increase without limit. As to 
the factors a’ ,p , b* ,p , • ■ •, they have been introduced so as to 
have in the development of xf/(x) symmetric functions of the 
quantities 

akp ak 2 , ■ • ■. ak K , 
bln bl 2 , • • ■, bl„,, 

that is, rational integral numbers. Later on we shall have 
to develop the expressions 

S^(\+h), SlK 1, + h), ••• 

V V 

The presence of these same factors will still be necessary if 
we wish the coefficients of these developments to be integers 
each divided by (p — 1)!. 

1. ^(h) is an integral number , not divisible by p and con- 
sequently different from zero. 


TRANSCE NJJEN CE OF THE NUMBER it. 


71 


Arranging ip(U) in increasing powers of h, it takes the form 

r=Np-t-iTp-f ■ • • -*■ p— 1 

■H h ) — S c r h r . 

r=P — 1 

In this development all the coefficients have integral numer- 
ators and the common denominator (p — 1)!. 

The coefficient of the first term h 1 * -1 may be written 

jry (ak, ' ak s • • ■ ak v ) p a’‘' p a'' ,p • • • 

(bl, • bl 2 ■ • • bl N .) p b ! ’ p b w '’ p ■ • • 


1 

(P-1)! 


(- 1 ) 


Np+N'p-f 


' '(a„a r '~ 1 ) p a” p a"' p - • (b B b" _, ) p b" p b’ ">> ■ ■ • 


If in this term we replace h 1 *" 1 by its value (p — 1)! the 
denominator disappears. According to the hypotheses made 
regarding the prime number p, no factor of the product is 
divisible by p and hence the product is not. 

The second term c p h p becomes likewise an integer when 
we replace h p by p! but the factor p remains, and so for all 
of the following terms. Hence ^(h) is an integer not divis- 
ible bv p. 

2. For x, a given finite quantity , and p increasing without 
limit, ip (x) = V c r x r tends toward zero, as does also the sum 
£ic r x r |. 

r 


We may write 

<A(x) = 2c r x r 

= (pbf), C 3 " 3 " • • b^b" (k 1 x)( k., x) • • ■ (k„ — x) 

(I— x)(l,-x) • • ■ x) ■ ■ -j p . 
Since for x of given value the expression in brackets is a con- 
stant, we may replace it by K. We then have 

/ t \ ( xK ) P ^' * 

a quantity which tends toward zero as p increases indefinitely. 


72 


FAMOUS PROBLEMS. 


The same reasoning will apply when each term of <f/ (x) is 
replaced by its absolute value. 

v= n 

3. The expression £ \fi (k,, + h) is an integer divisible by p. 
*■=1 

We have 


f(k , + h)= ?^JQpb- p b"'p- 

■ a < " _l,p [(k, — k„ — h)(k 2 — k„ — h) • 

■ a"’ p b Np [(l 1 — k„ — h ) ( 1 2 — k,,— h) ■ 


• ■(— h)- • ' (k„ k, h)] p 

• -(l N .-k,-h)] p 


The i/th factor of the expression in brackets in the second 
line is — h, and hence the term of lowest degree in h is h p . 
Consequently 

r=Np+N'p+ -+p-i 

*(k, + h)= S c',h r , 

r=p 

whence 

V— n r=np+N p+ • +p-l 

2*(k,+ h) = 2 C' t h r . 

»*=1 r=p 

The numerators of the coefficients C' r are rational and integral, 
for they are integral symmetric functions of the quantities 
ak,, ak. 2 , • ■ •, ak„, 

bli, bl 2 , * * *, bl v , 


and their common denominator is (p — 1)!. 

If we replace h r by r! the denominator disappears from all 
the coefficients, the factor p remains in every term, and hence 
the sum is an integer divisible by p. 

Similarly for 

“t </'(•. + b) • • ■ 

v=l 

We have thus established three properties of \p (x) analogous 
to those demonstrated for <f> (x) in connection with Hermite’s 
theorem. 


Tit A NS CEN DEN CE OF THE NUMBER it. 


73 


3. We now return to our demonstration that the assumed 
equation 

(1) C 0 +C I (e k ' + e k3 +- ■ • + e k -)+C 2 (e , ‘ + e l2 +- ■ • =0 

cannot be true. For this purpose we multiply both members 
by <// (h), thus obtaining 

C 0 i/'(h) + Ci[e kl V'(h) + e k V(h) + • • • + e k "</r (h)] + • • - = 0, 
and try to decompose each of the expressions in brackets into 
a whole number and a fraction. The operation will be a little 
longer than before, for k may be a complex number of the form 
k^k’+'ik". We shall need to introduce |k| = -f- \/k' 2 + k" 2 - 
One term of the above sum is 


e k ' (h) = e k X c r h r = £ c r • e k • h r . 

r r 

The product e k ■ h r may be written, as shown before, 

e-h'=( h + k)' + k'[ r £_ + ( , +1 * +2i + ■ ■ ■] 

The absolute value of every term of the series. 

k , k 2 


0 -) 4. - 

^r+l^(r+l)(r + 2) 


+ 


is less than the absolute value of the corresponding term in 
the series 

k k 2 

ek==l + l + 2^ + - ‘ • 


Hence 


or 


k 


+ 


k 2 


r + l ( r + l)( r +^) 

k , k 2 


<e |k 


: q^e 1 


Jkl 


r + l ( r +l)( r + 2) 

q r ,k being a complex quantity whose absolute value is less 
than 1. 

We may then write 

e k • if/ (h) = 2? c,e k h T = £ c r (h + k) r + V c r q rk k r e |kl 
= <P (h + •») + t Crq r , k k r • e lk ' 


74 


FAMOUS PROBLEMS. 


By giving k in succession the indices 1, 2, • • • n, and form- 
ing the sum the equation becomes 

e k V(h)-f e k V(h)+- • ■+e k "'V'(h) 

= t<P(.K + h)+2 |e!M2c r k r r q „ j. 

V =1 r 

Proceeding similarly with all the other sums, our equation 
takes the form 

(2) C 0 ^(h)4-Cri;V(l<,+ h)+C 2 , ’sV(l,+ h)+- • • 

y=l v — i 

+ C i 2 2 e|k ‘' | c r klq r , ltii + C 2 ■% e|''!c;i r ,an,-l = 0. 

V— 1 r V— 1 

B y ^ can make ^|c r k r | as small as we please by taking 

p sufficiently great. Since |q r J< 1, this will be true a fortiori 
of 

2 c r k r q r j, 

and hence also of 

2 2 c r k r „q r , k e |k ‘' 1 . 

»^=1 r 

Since the coefficients C are finite in value and in number, the 
sum which occurs in the second line of (2) can, by increasing 
p, be made as small as we please. 

The numbers of the first line are, after the first, all divis- 
ible by p (3), but the first number, C 0 ^(h), is not (1). 
Therefore the sum of the numbers in the first line is not 
divisible by p and hence is different from zero. The sum of 
an integer and a fraction cannot be zero. Hence equation (2) 
is impossible and consequently also equation (1).* 

4 . We now come to a proposition more general than the 
preceding, but whose demonstration is an immediate conse- 

* The proof for the more genera] case where C 0 = 0 may be reduced 
to this by multiplication by a suitable factor, or may be obtained directly 
by a proper modification of <f> (h). 


TRANSCENDENCE OF THE NUMBER n. 


75 


quence of the latter. For this reason we shall call it Linde- 
mann’s corollary. 

The number e cannot satisfy an equation of the form 
(3) C'o+C' 1 e k, + C' 2 e' 1 +- •• = (), 


in which the coefficients are integers even when the exponents 
k„ l t , • • • are unrelated algebraic numbers. 

To demonstrate this, let k 2 , k 3 , ■ • k K be the other roots of 
the equation satisfied by k, ; similarly for l 2 , l 3 , • • •, l H ., etc. 
Form all the polynomials which may be deduced from (3) 
by replacing k 2 in succession by the associated roots k 2 , • • •, 
l 2 by the associated roots l 2 , • ■ • Multiplying the expres- 
sions thus formed we have the product 


», 0 , 


n {C'o+C', e ‘«+C' s e , 0 + - • 


a = 1, 2, ■ ■ •, n 

0 = 1 , 2 , ■■■,»' 


= Co + C, (e kl + e k2 + b e k ") + C 2 (e k,+k2 + e k2+k * + ■ • •) 

+ C 3 (e kl+ll + e k ‘ +,2 -f ■ ••)+••• 


In each parenthesis the exponents are formed symmetrically 
from the quantities k t , l i; • • •, and are therefore roots of an 
algebraic equation with integral coefficients. Our product 
comes under Lindemann’s theorem ; hence it cannot be zero. 
Consequently none of its factors can be zero and the corollary 
is demonstrated. 

We may now deduce a still more general theorem. 

The number e cannot satisfy an equation of the form 
C ( >> + C<'>e k + C ( ;>e‘ + ■ • =0 

where the coefficients as well as the exponents are unrelated 
algebraic numbers. 

For, let us form all the polynomials which we can deduce 
from the preceding when for each of the expressions C (l \ we 
substitute one of the associated algebraic numbers 
C <i\ Of, ■ ■ ■ 0* \ 


76 


FAMOUS PROBLEMS. 


If we multiply the polynomials thus formed together we get 
the product 

n S C'o' + Cf > e k + C < 2 > e 1 + ■ • •( 

«>?.»>■■■ 


— C 0 + C k e k + C,e' + ■ • ■ 

+ C k k e k+k + C k ,,e k+1 + • • • 

+ 

+ - 

where the coefficients C are integral symmetric functions of 
the quantities 

ru> rct> . . . re* 0 > 

o y o j > ^ o y 

C‘J>. C<?>, • • Ol», 


and hence are rational. By the previous proof such an 
expression cannot vanish, and we have accordingly Linde- 
mann’s corollary in its most general form : 

The number e cannot satisfy an equation of the form 

Co + C,e k + Qe 1 + • ■ =0 

where the exponents k, I, • ■ as well as the coefficients C 0 , Ci, 

• ■ • are algebraic numbers. 

This may also be stated as follows : 

In an equation of the form 

Co Cie k -I - Cje 1 -f- • * • — 0 

the exponents and coefficients cannot all be algebraic numbers. 

6. From Lindemanri’s corollary we may deduce a number 
of interesting results. First, the transcendence of i r is an 
immediate consequence. For consider the remarkable equa- 
tion 


« = 1, 2, • 

• % N( 

0=1,2, • 

■ •, »i 

7 = 1,2, • 

■ N, 


1 + e 1 * = 0. 


TRANSCENDENCE OF THE NUMBER it. 


77 


The coefficients of this equation are algebraic ; hence the 
exponent 'nr is not. Therefore, ir is transcendental. 

6. Again consider the function y = e*. YVe know that 
1 = e°. This seems to be contrary to our theorems about the 
transcendence of e. This is not the case, however. We 
must notice that the case of the exponent 0 was implicitly 
excluded. For the exponent 0 the function ^(x) would lose 
its essential properties and obviously our conclusions would 
not hold. 

Excluding then the special case (x = 0, y = 1), Lindemann’s 
corollary shows that in the equation y = e * or x = log e y, y and 
x, i.e., the number and its natural logarithm, cannot be alge- 
braic simultaneously. To an algebraic value of x corresponds 
a transcendental value of y, and conversely. This is certainly 
a very remarkable property. 

If we construct the curve y = e* and mark all the algebraic 
points of the plane, i.e., all points whose coordinates are alge- 
braic numbers, the curve passes among them without meeting 
a single one except the point x = 0, y — 1. The theorem still 
holds even when x and y take arbitrary complex values. The 
exponential curve is chen transcendental in a far higher sense 
than ordinarily supposed. 

7. A further consequence of Lindemann’s corollary is the 
transcendence, in the same higher sense, of the function 
y = sin~‘ x and similar functions. 

The function y = sin -1 x is defined by the equation 

2 ix = e iy — e-*. 

Wb see, therefore, that here also x and y cannot be algebraic 
simultaneously, excluding, of course, the values x = 0, y = 0. 
We may then enunciate the proposition in geometric form : 

The curve y = sin -1 x, like the curve y = e x , passes through 
ru> algebraic point of the plane, except x = 0, y = 0. 


CHAPTER V. 

The Integraph and the Geometric Construction of n. 

1. Lindemann’s theorem demonstrates the transcendence 
of 7 r, and thus is shown the impossibility of solving the old 
problem of the quadrature of the circle, not only in the sense 
understood by the ancients but in a far more general manner. 
It is not only impossible to construct ir with straight edge 
and compasses, but there is not even a curve of higher order 
defined by an integral algebraic equation for which ir is the 
ordinate corresponding to a rational value of the abscissa. 
An actual construction of 7 r can then be effected only by the 
aid of a transcendental curve. If such a construction is 
desired, we must use besides straight edge and compasses 
a “ transcendental ” apparatus which shall trace the curve by 
continuous motion. 

2. Such an apparatus is the integraph, recently invented 
and described by a Russian engineer, Abdank-Abakanowicz, 
and constructed by Coradi of Zilrich. 

This instrument enables us to trace the integral curve 
Y = F(x)=/f(x)dx 
when we have given the differential curve 

y = f(x). 

For this purpose, we move the linkwork of the integraph 
so that the guiding point follows the differential curve ; the 
tracing point will then trace the integral curve. For a fuller 
description of this ingenious instrument we refer to the 
original memoir (in German, Teubner, 1889 ; in French, 
Gauthier-Villars, 1889). 


GEOMETRIC CONSTRUCTION OF it. 


79 


We shall simply indicate the principles of its working. 
For any point (x, y) of the differential curve construct the 
auxiliary triangle having for vertices the points (x, y), (x, 0), 
(x — 1, 0); the hypotenuse of this right-angled triangle makes 
with the axis of X an angle whose tangent = y. 

Hence, this hypotenuse is parallel to the tangent to the inte- 
gral curve at the point (X, Y) corresponding to the point (x, y). 



The apparatus should be so constructed then that the trac- 
ing point shall move parallel to the variable direction of this 
hypotenuse, while the guiding point describes the differential 
curve. This is effected by connecting the tracing point with 
a sharp-edged roller whose plane is vertical and moves so as to 
be always parallel to this hypotenuse. A weight presses this 
roller firmly upon the paper so that its point of contact can 
advance only in the plane of the roller. 

The practical object of the integraph is the approximate 
evaluation of definite integrals ; for us its application to the 
construction of tt is of especial interest. 

3. Take for differential curve the circle 
x’+y’=r*; 


80 


FAMOUS PROBLEMS. 


the integral curve is then 

Y = f Vr* — x*dx = £ sin' 1 - + * Vr s — x s . 

J 2 r 2 

This curve consists of a series of congruent branches. The 
points where it meets the axis of Y have for ordinates 


Upon the lines X — ± r the intersections have for ordinates 



If we make r 1, the ordinates of these intersections will 
determine the number i r or its multiples. 

It is worthy of notice that our apparatus enables us to 
trace the curve not in a tedious and inaccurate manner, but 
with ease and sharpness, especially if we use a tracing pen 
instead of a pencil. 

Thus we have an actual constructive quadrature of the 
circle along the lines laid down by the ancients, for out 
curve is only a modification of the quadratrix considere 
by them. 


NOTES 

PART I — CHAPTER III 

Gaussian Polygons. Up to the time of Gauss, no one suspected 
that it was possible to construct, with ruler and compasses, regular 
polygons other than those the number of whose sides could be 
expressed in one of the forms: 2 n , 2 n • .3, 2 n • 5, 2" • 15. All of these 
were known to the Greeks. But Gauss proved as early as 1801 1 
that whenever a prime number F /t could be expressed in the form 

+1. the construction of a regular polygon with F^ sides was 
possible by Euclidean methods. It was then apparent that regular 
polygons not included in the Euclidean series, namely 17, 257, 
65537, . . . sides, could be constructed under the same imposed 
conditions. And indeed Gauss’s discussion led to the result 2 , that 
the only regular polygons which it is possible to construct with 
ruler and compasses, are those the number P of whose sides can 
be expressed in the form 

2“ • (22'“ + 1) - (2-“ 2 + 1) • (22“ 3 + 1) ■ - . (2 2 “s + l ), 

where a . . . cc s are distinct positive integers and each 2 2 “‘ -f- 1 is 
a prime. The number of such polygons is small in comparison with 
the number of regular polygons which can not be constructed with 
the means employed. As Dickson has pointed out 3 the number of 
I s up to 100 is 24; up to 300 is 37 (all noted by Gauss); up to 1000 
is 52; up to 1000000 only 206. Kraitchik has remarked 4 that there 
are only 30 polygons with an odd number of sides that are known 
to be constructible with ruler and compasses. These polygons have 
the following number of sides: 5, 15, 17, 51, 85, 255, 257, 771, 1285, 
3855, 4369,13107, 21845, 65535, 65537, 196611, 327685, 983055, 
1 114129, 3342387, 5570645, 16711935, 16843009, 50529027^ 
84215045, 252645135, 286331153, 858993459, 1431655765,’ 

Disquistiones arithmetics, Leipzig, 1801, p. 664; Werke, v. 1., 2. Ab- 
druck, 1870, p.462; French ed. Recherches Arithmdtiques, Paris, 1807, 
p. 488; Ger. ed. by Maser, Berlin, 1889, p. 447. 

" This result was, in effect, stated, but not proved, by Gauss. 

L. E. Dickson, “On the number of inscriptible regular polygons". 
Bull. N. Y. Math. Soc., Feb., 1894, v. 3, p. 123. 

Kraitchik, Recherches sur la thtorie des nombres, Paris, 1924, p. 270. 


82 


NOTES 


4294967295. This set of numbers, together with 1 and 3, coincides, 
with the divisors of 2 32 • — 1 = 1 ■ 3 • 5 • 17 • 257 • 65537. 

The determination of the number of regular polygons which can 
be constructed for P less than a given integer is, then, bound up 
in the determination of the prime numbers F H . Now for only 18 
values of u has it been shown whether 1+ is prime or not, namely 
for the values of fi from 0 to 9 inclusive, and for 11, 12, 15, 18, 23, 
36, 38, 73. In the first five of these cases, and in these alone, is F^ 
prime. These five cases were noted by Fermat in the seventeenth 
century. It may well turn out that F^ is not prime, for >- 4, 
although Eisenstein proposed as a problem 1 : “There are an infinity 
of prime numbers of the form 2-' 1 + 1”. 

The results already established in this connection may be set 
forth in tabular form 2 : 





Year 

n 

Prime Factors of F fl = 2 2/i -f 1 

Discoverer 

of 

Die- 




co very 

0-4 

3, 5, 17, 257. 65537 

Fermat 

1640 

5 

(2 7 ■ 5 + 1 = 641 1 

|2 7 - 52347 + 1 = 6700417 / 

L. Euler 

1732 


(Unknown but composite 

Lucas 

1878 

6 

J2 8 .9. 7-17+1— 274177 

Landry 

1880 


1 2®-5. 525 6 2 8 29149 + 1+67280421310721 

Landry and Le Laeseur 

1880 

7 

Unknown but composite 

A. E Western, J.C.Morehead 

1905 

8 

Unknown but composite 

A. E. Western J.C.Morehead 

1909 

9 

2 16 • 37 + 1 = 2424833 

A. E. Western 

1903 

11 

f 2 13 - 3 . 13 + 1 = 3194891 
)2 13 • 7 ■ 17 + 1 — 974849) ' 

A, Cunningham 

1899 


2 14 - 7 + 1 = 114689 

E. A. Lucas andP.Pervouchine 

1877 

12 

12 16 - 397 + 1 = 26017793 1 

|2 16 • 7 • 139 +1 = 63766529) ' ’ ’ 

A. E. Western 

1903 

15 

2 21 - 579 + 1 — 1214251009 

M. Kraitchik 

1925 

18 

2 2 °. 13 + 1 = 13631489 

A. E. Western 

1903 

23 

2 25 . 5 + 1 — 16 7 7 7 2 1 61 

P. Pervouchine 

1878 

36 

2 39 - 5 + 1 — 2 7 4 8 7 7 9 0 6 9 4 41 

Seelhoff 

1886 

38 

2 41 . 3 + 1 = 6597069766657 

(J. Cullen, A. Cunningham, 
\A. E. and F. J. Western 

1903 

73 

| 2 76 - 5+ 1=18 8 8 9 4 5 5 9 3 1 4 7 8 5 8 0 8 5 4 7 8 41 

1 J. C. Morehead 

1906 


1 G. Eisenstein, “Aufgaben”, Crelle’s Journal , v. 27, 1844, p. 87. 



• The Bources for the different results, except those of Fermat, are as 
follows, for the 13 different values of /*: 


NOTES 


83 


The labor expended in deriving these results has been enormous; 
to the layman who knows nothing of congruences in the theory of 
numbers, the faots found must seem almost to border on the mira- 
culous. For, even when p = 10, a case not yet solved, F u contains 
309 digits; but when fx = 36, F,, is a number of more than twenty 
trillion digits. Concerning it Lucas remarked 1 “la bande de papier 
qui le contiendrait ferait le tour de la Terre”. For u — 73, Ball 
states that the digits in F^ “are so numerous that, if the number 
were printed in full with the type and number of pages used in this book 
[ Mathematical Recreations, fifth edition, 1911, 508 pages], many more 

5. L. Euler, Commentarii Academic Scientiarium Petrop., v. 6 (1732 — 3), 
1738, p. 104; laid before the Academy of St. Petersburg, 26. Sept. 1732. 

In his autobiography (Springfield, Mass., 1833, p. 38) the American 
calculator Zera Colburn records that while on exhibition in London, at 
the age of 8, he found “by the mere operation of his mind” the factors 641 
and 6, 700, 417 of 4, 294, 967, 297 ( = 2“ + 1). Cf. F. D. Mitchell, “Mathe- 
matical prodigies”, Amer. Journal of Psychology , v. 18, 1907, p. 65. 

6. Lucas, Comptes Rendus de V Academic des Sciences, Paris, v. 85, 1878, 
p. 138; Amer. Jour. Math.., v. 1, 1878, p. 238; Recreations mathematiques, 
v. 2 (2e 6d., 1896), p. 234 — 5. Landry, Nouv. Corresp. Math., v 6 1880 
p. 417. 

7. Independent discoverers: Western, Proc. Land. Math. Soc., s. 2, v. 3, 
p. xxi— xxii. Abstract of paper read, April 13, 1905 ; Morehead, Bull. 'Amer. 
Math. Soc., v. 11, p. 543 — 545, abstract of paper read April 29, 1905. 

8. Western and Morehead, Bull. Amer. Math. Soc., v. 16, 1909, p. 1 6' 

“each doing half of the whole work”. 

9. 12 (Western), 13, 16. Proc. Lond. Math. Soc., e. 2, v. 1, 1903, p. 175 ■ 
abstract of paper read May 14, 1903. 

11. A. Cunningham, Brit. Assoc. Rept., 1899, p.653—4; the factors 
are here given as 319489 and 974489. The second number is incorrect, 

4 and 8 being interchanged. The other forms of the correct factors were' 
given by A. Cunningham and A. E. Western in Proc. Lond. Math. Soc., 
s. 2, v. 1, 1903, p. 175. It is here noted also that there are no more factors 
of Fp < 10 6 , and no other factor of Fp < 10 s , (p not less than 14). 

12, 23. E. Lucas, Atti Accad. Torino, v. 13 (1877—8), p. 271 [27 Jan., 

1 87 8 J . Melanges math. ast. acad. Petersb., v. 5, part 5, 1879, p.505 519 
or Bull. Acad. Petersb., s. 3, v. 24, 1878, p. 559; s. 3, v. 25, 1879, p. 63- 
communication of results, for p = 12 and 23, found by J. Pervouchine’ 
in Nov. 1877 and Jan. 1878. He notes that the integer 2 2 * 4 * ” +1 contains 
2 525 223 digits. 

15. M. Kraitchik, Comptes Rendus de V Academic des Sciences, Paris, v. 180 
p. 800, March, 1925; also Sphinx-Oedipe, v. 20, p. 24. 

36. P. Seelhoff, Zeitschrift math. u. Phys., v. 31, 1886, p. 174. 

73. J. C. Morehead, Bull. Amer. Math. Soc., v. 12, 1906, p. 449—451. 

1 E. Lucas, Theorie des nombres, Paris, v. 1, 1891, p. 51. 




- 


84 


NO TES 


volumes would be required than are contained in all the public 
libraries of the world”. 

In not less than seven places 1 , during the years 1640-58, did 
Fermat refer to F^ = 2^ + 1 as representing a series of prime 
numbers; but in no place did he claim that F /( was always prime. 

Gauss's Statement of his Polygon Results. In two passages 
the implication to be drawn from what Klein has written is, that 
Gauss published a proof that a regular polygon of p sides can not 
be constructed by ruler and compasses if p is a prime not of the 
form 2 k + 1. The passages to which I refer are (pages 2, 16): 

(1) “Gauss added other cases [to Euclid’s] by showing the possi- 
bility of the division into parts where p is a prime number of the 
form p = 2 2/i + 1, and the impossibility for all other numbers”; 
(2) “Gauss extended this series of numbers [Euclid’s] by showing 
that the division is possible for every prime number of the form 
p = 2 2M + 1 but impossible for all other prime numbers and their 
powers”. Now the implication referred to above is not correct, as 
Pierpont interestingly set forth in his paper “On an undemonstrated 
theorem of the Disquisitiones Arilhmeticae ” 2 . That is. Gauss did not 
give a prooj of the “impossibility 11 referred to in the quotations. But 
after proving the “possibility” described above he continued as follows : 

"As often as p — 1 contains other prime factors besides 2, we arrive 
at higher equations 3 , namely, to one or more cubic equationas, if 3 enters 


1 Letter dated Aug. [ ?] 1640 to Frenicie (Oeuvres de Fermat, v. 2, 1894, 
p. 206); letter dated 18 Oct., 1640, to Frenicie (Oeuvres, v. 2, 1894, p. 208); 

Varia Opera, Toulouse, 1679, p. 162 ; Brassine’s Precis, Toulouse, 1853, 
p. 142 — 3); letter dated 25 Dec., 1640, to Mersenne ( Oeuvres , v. 2, p. 212 — 
213); “De solutioue problematum geometriconum per curvas simplicissimas 
et unicuique problematum generi proprie convenientes, Dissertatio tri- 
partita” (Oeuvres de Fermat, v. 1, 1891, p. 130 — 131; French translation, 
v. 3, 1896, p. 120; Varia Opera, 1679 [reprint, 1861], p. 115); letter dated 
29 August, 1654, to Pascal (Oeuvres de Pascal, v. 4, Paris, 1819, p. 384; 
Oeuvres de Fermat, v. 2, 1894, p. 309 — 310); letter to Sir Kenelm Digby, 
sent by Digby to Wallis, 19 June, 1658 (Oeuvres de Fermat, v. 2, 1894, 
p. 402, 404 — 5; French translation of the Latin, v. 3, 1896, p. 314, 316); 
letter dated August, 1659 to Carcavi, copy sent by Carcavi to Huygens 
14 August, 1659 (Corresp. de Huygens no. 651; Oeuvres de Fermat, r. 2, 
p. 433—434). 

3 Bull. Amer. Math. Soc ., v. 2, 1895, p. 77 — 83. 

• In his earlier discussion of an inscribed polygon of p sides, Gauss 
considers the equation xv — 1 =0 and the resulting equation got by 
dividing out the factor x — 1 , where p is a prime. 


NOTES 


85 


once or oftener as a factor of p — 1, to equations of 5 th degree if p — 1 
is divisible by 5, etc. And we can prove with all rigour that these 
equations cannot be avoided or made to depend upon equations of lower 
degree; and although the limits of this work do not permit us to give 
the demonstration here, we still thought it necessary to signal this fact 
in order that one should not seek to construct other polygons than 
those given by our theory, as, for example, polygons of 7, 11, 13, 19 
sides, and so employ one’s time in vain.’* 

Fermat’s Theorem. This theorem (p. 17) was indicated by Fer- 
mat in a letter, dated 18 October 1640, to B. Frenicle de Bessy 
( Oeuvres de Fermat, v. 2, 1894, p. 209), Euler gave two proofs 
( Comment . Acad. Petrop. , v. 8 for 1736, 1741, p. 141, and Comment. 
Nov. Acad. Petrop., v. 7 for 1758-59, 1761, p. 49). Other proofs 
are due to Lagrange ( Nouv . Mem. de I’Acad. de Berlin, 1771) and 
to Gauss ( Disquisitiones Arithmetic <x, § 49) 


PART I — CHAPTER IV 

Geometrical Constructions of the Regular Heptadecagon. The 
remark of Klein (p. 24, 32) that we posses as yet no method of 
construction of the regular polygon of seventeen sides, based upon 
considerations purely geometrical, is a little curious, since several 
constructions of this kind have been given. One by Erchinger was 
indeed reported by Gauss in 1825 1 . The construction is as follows: 

Let D, B, G, A, I, F, C, E be points on a line determined by con- 
structions about to be given. Let AB be a line of any length. Pro- 
duce it both ways to C and D so that, 

H 1 1 1 1 — I 1 1— 

D B G A I F C E 

AC X BC — AB X BD — 4 AB K 


1 Oottingische gelehrte Ameigen, Dec. 19, 1825, no. 203, p. 2025; Werke, 
v. 2, p. 186—7. To Art. 365 of the Disquisitiones Arithmetical Gauss added 
this note in his handwriting: “Circuluni in 17 partes divisibilem esse geo- 
metrice, deteximue 1796 Mart. 30”. Cl. Werke v. 1, p. 476 and v. 10,, 
19 17, p. 3—4, 120—126, 488. The discovery of the result was first announced 
in the Intelligenzhlatt of the Allgemeine Literatur-Zeitung, no. 66, 1 June, 
1796, col. 554. 


86 


NOTES 


Further determine the points, E, G, on both sides of CA produced 
so that, 

AE X EC = AG X CG = AB * 1 ; 

and find the point F on the side A of the line BA produced, such 
that 

AF X DF = AB*. 

Finally divide AE in / so that 

Al X El = AB X AF, 

where A I is the smaller, and El the larger part of AE. Then con- 
struct a triangle, in which each of two sides equals AB, the third 
being equal to Al. About this triangle describe a circle; then Al 
will be one side of the regular inscribed polygon of seventeen sides. 

Gauss particularly remarks that the author gave a purely syn- 
thetic proof of this construction. 

Another synthetic construction and proof dated “Dublin, 17 th 
October, 1819” was published by Samuel James in the Transactions 
of the Irish Academy 1 . Yet another construction was given by John 
Lowry in The Mathematical Repository 2 for 1819. But the earliest 
published geometrical construction was given by Huguenin in his 
Mathematische Beitrdge zur weiteren Ausbildung angehender Geometer, 
Konigsberg, 1803, p. 283. 

A score of geometrical constructions are assembled in A. Golden- 
ring, Die elementargeometrischen Konstruktionen des regelmassigen 
Siebzehnecks, Leipzig, 1915. See also the review of this work in 
Bull. Amer. Math. Soc., v. 22, 1916, p. 239 — 246, and my note 
“Gauss and the regular polygon of seventeen sides” in Amer. Math. 
Monthly, v. 27, 1920, p. 323—326. 

The discovery that the regular polygon of seventeen sides could 
be constructed with ruler and compasses was not only one of which 
Gauss was vastly proud throughout his life, but also, according to 
Sartorius von Waltershausen 3 , the one which decided him to dedicate 
his life to the study of mathematics. Archimedes expressed the wish 
that a sphere inscribed in a cylinder be inscribed on his tomb, as 
Ludolf van Ceulen did in connection with the value of rr to 35 decimal 

1 V. 13 (1818), p.175 — 187 ; paper read Jan. 24, 1820. 

1 N. e., v. 4, p. 160. Lowry’s proof occupies p. 160 — 168. 

1 Oauss zum Oedachtniss, Leipzig, 1856, p. 16. 


NOTES 


87 


places, and Jacques Bernoulli with reference to the logarithmic 
spiral. So also, according to Weber 1 , Gauss requested that the 
regular polygon of seventeen sides should be engraved on his tomb- 
stone. While this request was not granted, as it was in each of 
the other cases mentioned, it is engraved on the side of a monument 
to Gauss in Braunschweig, his birthplace. 

Constructions in general with Ruler and Compasses. Regarding 
constructions as effected when intersections of circles with circles 
or lines, or of lines with lines may be determined, it can be shown 
that: Every, problem solved with ruler and compasses can be solved 
with compasses alone. This was first shown by Georg Mohr in his 
Euclides Danicus published at Amsterdam in 1672; this work was 
reprinted in 1928 by the Danish Society of Sciences. Klein refers 
(p. 33) only to Mascheroni’s proof of this result 125 years later, in 
his Geomelria del Compasso. Of this work there were two French 
editions Geometrie du Compas, Paris, 1798 and 1828. From the 
first of these a German edition L. Mascheroni's Gebrauch des Zirkels, 
Berlin, 1825, was prepared by J. P. Gruson. The subject is treated 
in English by : A. Cayley, Messenger of Math., v. 14, 1885, p.179—181 ; 
Collected Papers, v. 12, p. 314 — 317; by E. W. Hobson, in a presi- 
dential address, Mathematical Gazette, v. 7, 1913, p. 49—54; by 
H.P. Hudson, Ruler & Compasses, London, 1916, p. 131 — 143; and by 
J. Coolidge, Treatise on the Circle and Sphere, Oxford, 1916, p. 186 — 188. 

Klein has noted (p. 33 — 34) that Poncelet first conceived the 
result that given a circle and its center, every solution of a problem 
with ruler and compasses can be carried through with ruler alone. 
A little later Klein states (p. 34) “we will show how with the straight 
edge and one fixed circle we can solve every quadratic equation" . This 
is not possible; Klein should have had “with its center” after “one 
fixed circle”. That the center be also given is very essential when 
only one circle is given. Hilbert suggested the problem: How many 
given circles in a plane are necessary in order to determine with 
ruler alone, the center of one of them? In 1912 D. Cauer 2 showed: 
(a) If two circles do not intersect in real points it is generally impossible 
to determine the center of either circle with ruler alone; (b) A center 


1 Encyclopadie der elementaren Algebra und Analysis bearbeitet von H. 
Weber. 2. ed. Leipzig, 1906, p. 362. 

! Mathematische Annaten, v. 73, 1912, p. 90—94 ; v. 74, 1913, p. 462—464. 


88 


NOTES 


may be determined if the circles cut in real points, touch, or are 
concentric. About the same time J. Grossmann discovered a result 
which proved that Every problem solvable with ruler and compasses 
can also be solved with ruler alone if we are given, in the plane of 
construction, three linearly independent circles. Correct proofs of 
this result were given by Schur and Mierendorff. 

From this it is clear that every construction with ruler and com- 
passes can be effected with a ruler, and compasses with a fixed 
opening. Constructions of this kind were found already in the tenth 
century by Abu’l Wefa of Bagdad 1 . With such means, in the six- 
teenth century, certain problems of Euclid were solved by Cardano, 
Ferraro, and Tartaglia. At Venice in 1553 G. B. Benedetti published 
a little treatise, Resolulio omnium Euclidis problemalum, aliorumque 
ad hoc necessario invenlorum, una lantumodo circuli data apertura. 
In English the topic is treated in a rare pamphlet translated from 
the Dutch by Joseph Moxon 2 , and in an article by J. S. Mackay 3 . 

Every problem whose solution is possible by ruler and compasses 
can be also solved with a two edged ruler alone, whether the edges 
are parallel or meet in a point. For some of the literature in this 
connection the following sources may be consulted: Nouvelle Corresp. 
Math., v. 3, 1877, p. 204—208; v. 5, 1879, p. 439^42; v. 6, 1880, 
p. 34 — 35; Akademie der Wissen., Vienna, Sitzungsberichte, Abt.IIa, 
v. 99, 1890, p. 854 — 858; Bolleiino di Matematiche e di Scienze fisiche 
e naturali, v. 2, 1900—01, p. 129—145, 225—237. 

PART II — CHAPTER II 

Irrationality of n. Klein wrote (p. 59): “After 1770 critical rigour 
gradually began to resume its rightful place. In this year appeared 
the work of Lambert: Vorlaufige Kenntnisse fiir die, so die Quadralur 

1 “Woepcke” Analyse et extrait d’un recueil de constructions gCometri- 
ques par AboQl Wafa”, Journal Asiatique, 1855. 

* Compendium Euclidis Curiosi: or, geometrical operations. Showing how 
with a single opening of the Compasses and a straight ruler all the propositions 
of Euclid’s first five books are performed. London, 1677. Moxon does not 
tell ns who the author of the Dutch treatise was. 

1 “Solutions of Euclid’s problems, with a ruler and one fixed aperture 
of the compasses, by the Italian geometers of the sixteenth century” 
Proc. Edinb. Math. Soc., v. 5, 1887, p. 2—22. 


NOTES 


89 


. . . des Cirkuls suchen. Among other matters the irrationality of n 
is discussed. In 1794 Legendre in his Elements de Geomelrie showed 
conclusively that n and jt 2 * are irrational numbers.” The implication 
of this note is that Lambert did not discuss the irrationality of n 
conclusively and that Legendre did. How both of these points of 
view are essentially incorrect will appear in what follows. Klein 
was simply reproducing the erroneous statements of Rudio 1 ; but 
after Pringsheim’s careful study in 1898 2 , Lambert’s proof emerged 
as “ausserordentlich scharfsinnig und im wesentlichen vollkommen 
einwandfrei”, while Legendre’s remained “in Bezug auf Strenge 
hinter Lambert weit zuriick”. 

As in the later proof of the transcendence of jr, so here when 
its irrationality was in question, discussion of e is fundamental. 
The irrationality of e and e 2 was shown, substantially, by Euler in 
1737 s and he gave the expression for e as a continued fraction on 
which Lambert’s proofs of the irrationality of e x , tan * and n rest. 
Starting with Euler’s development 4 

e — 1 11111 

2 _ r+ ir+ nrpir+i8 + ctc -' 

Lambert found 

ex — 1 1 l l 1 

e x + 1 2/a + ti/a + 10/a + 14/s + dC '' 

and since 


1 F. Rudio: Archimedes , II ay gens, Lambert, Legendre, vier Abhandlungen 
iiber die Kreismessung. Leipzig, 1892, p. 56 f. This error is also reproduced 
by B. Cald in Enriques’s Fragen der Elementargeometrie, II. Teil, 1907, 
p. 315 ; by D. E. Smith in Young’s Monographs on Topics of Modern Mathe- 
matics, 1911, p. 401. The matter was correctly set forth by T. Vahlen in 
Konstruktionen und Appro ximationen, Leipzig, 1911, p. 319 f. 

1 A. Pringsheim: “Ober die ersten Beweise der Irrationalit&t von e und 
7t”, Bayerische Akad. der Wissen., Sitzungsberichte, mathem.-phys. Cl., 
v. 28, 1899, p. 325-337. 

8 “De fractionibus continuis”, Comment, acad. de Petrop, v. 9, 1744, 
p. 108. Presented to St. Petersburg Academy, March. 1737. 

4 L. Euler: Introductio in analysin infinitorum. Tomus Primus, Lau* 
sannae, 1748, p. 319. This work was finished in 1745; Cf. G. Enestrom, 
Verzeichnis etc., Erstc Lieferung, p. 25. 


90 


NOTES 


e x — l e l/2_ 



e x + 1 e x ^ + e x ^ 


tan z = 


1 h — 3/g — 5/2 — 7/0 — 9/0 — " ' • 


He then proved the theorems: 

1. If x is a rational number different from zero , e x can never be 
rational. 

For x = 1, we have as special case the irrationality of e. 

2. If z is a rational number different from zero, tan z can never be 


rational. 


For 0 = 7t/ 4, tan jt/ 4 
nality of tc. 


1, and hence as a special case the irratio- 


The part of Lambert’s Vorlaujige Kenntnisse to which Klein refers 
contains some formulae without proof, and no analytical develop- 
ments, and was rather intended to serve as a popular survey of 
the treatment of the topic. With it must be considered the scien- 
tifically remarkable ‘M&moire” of 1767 1 . Here “mit minutioser 
Genauigkeit Lambert proves the convergence of the expression 
for tan 0 as a continued fraction. Pringsheim dwells on the “astound- 
ing” nature of these considerations at this period in the history of 
mathematical thought. For of such considerations Legendre was 
innocent, as well as the great Gauss in his 1812 memoir on hyper- 
geometric series, and others, till a much later period. 

Thus the Lambert memoir contains the first , and for many 
years, the only example of what we now consider really rigorous 
developments of functions as converging continued fractions, in 
particular, that for tan 0 given above.” 

Measurement of a Circle. By considering inscribed and circum- 
scribed polygons up to 96 sides Archimedes arrived at the result 
that the ratio of the circumference of a circle to its diameter is less 
than 3— but greater than 3^. The following table exhibits the 
perimeters of regular inscribed and circumscribed polygons of a 
circle with a unit diameter (Chauvenet, Treatise on Elementary 
Geometry, Philadelphia, 1870, p. 161). 


1 “M6moire Bur quelquee propriety remarquables dee quantitds trane- 
cendantes circulairee et logarithmiquee”. Lu en 1767. Printed in 1768 in 
Hist, de Vacad. royale des sciences et belles-lettres, Berlin, Ann<5e 1761 (1), 
p. 265—322. 


NOTES 


91 


Number 
of sides 

Perimeter of 
circumscribed polygon 

Perimeter of 
inscribed polygon 

4 

4.0000000 

2.8284271 

8 

3.3137085 

3.0614675 

16 

3.1825979 

3.1214452 

32 

3.1517249 

3.1365485 

64 

3.1441184 

3.1403312 

128 

3.1422236 

3.1412773 

256 

3.1417504 

3.1415138 

512 

3.1416321 

3.1415729 

1024 

3.1416025 

3.1415877 

2048 

3.1415951 

3.1415914 

4096 

3.1415933 

3,1415923 

8192 

3.1415928 

3.1415926 


The remarkable approximation 355/113 for n is correct to six 
places of decimals. It seems to have been first given by a Chinese, 
Tsu Ch’ung-ching (5th century), and later by Valentin Otho (16th 
century) and Adriaen Anthonisz (17th century). Grunert gave a 
geometrical construction for n based on the fact that 355/113 = 3 + 
4 2 /(7 2 + 8 2 ), Archiv der Malhemalik und Physik,v. 12,1849, p. 98. 

Another construction was given by Ramanujan in Journ. Indian 
Math. Soc., v. 6, 1913, p. 132 (also in Collected Payers of Srinivasa 
Ramanujan, Cambridge, 1927, p. 22, 35). 

Euler’s Formula. The formula 

e 1 * = cos x + i sin x 

was first given by Euler in Miscellanea Berolinensia, v. 7, 1743, 
p. 179 (paper read 6 Sept. 1742), and again in his Introduclio in 
Analysin, Lausanne, 1748, v. 1, p. 104. He gave also 

e -1 * = cos x — i sin x. 

The equivalent of the form 

ix = log (cos x -f- i sin x 

was given earlier by Roger Cotes ( Philosophical Transactions, 1714, 
v. 29, 1717, p. 32) as: “Si quadrantis circuli quilibet arcus, radio 


92 


NOTES 


CE descriptus, sinura liabeat CX, sinumque complementi ad qua- 
drantem XE: sumendo radium CE, pro Modulo, arcus erit rationis 
inter EX -f XC \f — 1 & CE mensura ducta in y — 1.” See also Cotes, 
Harmonia Mensurarum, 1722, p. 28. 


PART II — CHAPTER IV 

In the course of the discussion on pages 61 — 74 it is assumed that 
there are an infinite number of prime numbers. One of the neatest 
proofs of this fact was given by Euclid (about 300 B.C.) in proposition 
20, book 9 of his Elements. 

On page 77, in considering the relation y = e x , Klein made a 
slight slip when he wrote: “To an algebraic value of x corresponds 
a transcendental value of y, and conversely.” “Conversely” leads 
to the statement, to a transcendental value of y corresponds an 
algebraic value of x. But a proof of this has nowhere been given; 
indeed the result is not true, in general. To correct delete “con- 
versely” and add: “To an algebraic value of y corresponds a transcen- 
dental value of x.” 


FROM 

DETERMINANT 

TO 

TENSOR 


W. F. SHEPPARD, Sc.D., LL.M. 


FORMERLY FELLOW OF TRINITY COLLEGE, C AM BRI DOE 


PREFACE 


The tensor calculus used in the mathematical treat- 
ment of relativity, and concisely explained by Professor A. S. 
Eddington in his ‘ Report on the Relativity Theory of 
Gravitation is, like the various kinds of vector calculus, 
a system of condensed notation which not only conduces to 
economy in the writing of symbols, but, what is more 
important, enables spatial and physical relationships to 
be grasped as' a whole without having to be built up from 
a number of components which really represent views 
from different parts of space. Three-dimensional geometry 
or physics is troublesome enough : the addition of a fourth 
dimension made the need of a condensed notation imperative. 

Professor Eddington has recently pointed out that the 
tensor notation and methods can be applied, with happy 
results, to other and more elementary classes of problems 
than those for which they were originally devised ; and 
this book is an attempt to put his somewhat compressed 
exposition into a form in which it may appeal to a larger 
circle of readers. The book, therefore, is not intended as 
an introduction to the mathematical theory of relativity — 
though I hope it may be of some use for that purpose — 
but rather as an exercise in the elementary application of 
methods which , apart from any practical use, possess a special 
beauty of their own. 

The new notation is not introduced until the fifth chapter. 
The properties of determinants, which serve as the starting- 
point for the application of the notation, are familiar to the 
mathematician ; but, as I hope the book may be read by 
some who are not entirely at ease with determinants, 


4 


Preface 


I have commenced with four chapters on the elementary 
theory of the subject. I make no apology for doing this, 
instead of referring the reader to the ordinary text-book 
on algebra. The text-book treatment is not always stimu- 
lating ; the reasons for the various stages are not necessarily 
clear to the student ; and attempt at simplicity sometimes 
leads to loss of rigidity in proof. In such a subject it 
is necessary to take the reader into one’s confidence ; and 
this earlier part may in this respect be found helpful to 
some, teachers or students, to whom the later part makes 
at first a less strong appeal. 

I have added a chapter on some applications to the 
theory of statistics, to which the tensor calculus seems 
specially suitable. The basis of this portion, so far as 
method is concerned, is a short paper by Professor Eddington, 
mentioned at the end of the chapter. This is one only of 
many possible applications. 

What I have called double sets will be recognized by 
the advanced student as matrices ; and many of the pro- 
positions will be found to be familiar. But the tensor 
calculus may fairly claim that, in bringing into close 
relation various branches of mathematical study, previously 
regarded as distinct, it gives them a new life. 

I have to thank Professor Eddington for looking at my 
manuscript and making some corrections and suggestions. 


5 June 1923. 


W. F. S. 


CONTENTS 


INTRODUCTION 

DETERMINANTS 

I. ORIGIN OF DETERMINANTS 

§ I. Solution of simultaneous equations; §2. Formula 
for solution ; § 3. General problem ; § 4. Construction of 
terms ; § 5. Rule of signs. 

II. PROPERTIES OF DETERMINANTS .... 

§ 1. Definition of determinant ; § 2. Elementary proper- 
ties ; § 3. Properties depending on the rule of signs; 
§ 4. Cofactors and minors. 

III. SOLUTION OF SIMULTANEOUS EQUATIONS . 

§ 1. Statement of previous results ; § 2. General solution. 

IV. PROPERTIES OF DETERMINANTS ( continued ). 

§ 1. Sum of determinants; § 2. Multiplication of deter- 
minant by a single factor ; § 3. Alteration of column or 
row ; § 4. Calculation of determinant ; § 5. Product of 
determinants ; § 6. The adjoint determinant. 

V. THE TENSOR NOTATION 

§ 1. Main properties of determinant ; § 2. Reciprocal 
determinant ; § 3. Elements of reciprocal determinant ; 
§ 4. Set-notation ; § 5. Principles of set-notation ; § 6. Pro- 
duct-sum notation; § 7. Inner products of sets; § 8. Unit- 
set notation ; § 9. Properties of the unit set ; § 10. Deter- 
minant properties; § 11. Example of method. 

SETS 

VI. SETS OF QUANTITIES 

§ 1. Introductory; § 2. Single sets; § 3. Double sets; 

§ 4. Sets generally ; § 5. Sums and products of sets ; 

§ 6. Inner product ; § 7. The unit set ; § 8. Inverse double 
sets; § 9. Reciprocation; § 10. Continued inner products; 

§ 11. Partial sets. 


6 


Contents 


PAGE 

VII. RELATED SETS OF VARIABLES .... 72 

§ 1. Variable sets ; § 2. Direct proportion of single sets ; 

§ 3. Reciprocal proportion of single sets ; § 4. Cogredience 
and contragredience ; § 5. Contragredience with linear 
relation ; § 6. Ratios of sets generally ; § 7. Related sets 
of higher rank. 

VIII. DIFFERENTIAL RELATIONS OF SETS ... 85 

§ 1. Derivative of a set ; § 2. Derivative of sum or pro- 
duct ; § 3. Derivative of function of a Bet ; § 4. Transforma- 
tion of quadratic form to sum of squares. 

IX. EXAMPLES FROM THE THEORY OF STATISTICS . 92 

§ 1. Preliminary ; § 2. Mean-product set ; § 3. Conjugate 

sets; § 4. Conjugate sets with linear relations; § 5. The 
frequency-quadratic ; § 6. Criteria for improved values ; 


§ 7. Determination of improved values. 

X. TENSORS IN THEORY OF RELATIVITY . . .115 

§ 1. Preliminary ; § 2. Single sets (vectors) ; § 3. Other 
sets ; § 4. Reason for limitation ; § 5. Miscellaneous proper- 
ties. 

APPENDIX : Product of Determinants 122 

INDEX OF SYMBOLS 124 

GENERAL INDEX 125 


INTRODUCTION 


As this is a comparatively new subject to most readers, 
it may be as well to explain briefly what it is about. 

A vector in (say) 3 dimensions is a directed quantity, 
determined as regards both direction and magnitude by its 
components, which are magnitudes measured along three 
definite axes. These axes being supposed to have been fixed 
beforehand, we can take them in some definite order; and 
a vector a is then determined by a set of 3 quantities, 
which we may call 

A i A 2 A 3 . 

Algebraically, the idea of a vector can be extended to some- 
thing which is determined by a set of m quantities 
A 3 A 2 A 3 ... A m , 
where m may have any value. 

A determinant (say) 



is the algebraical sum of all the products that can be 
formed in a certain way according to a certain rule of signs 
from the set of quantities 

a \ \ c ’i 
^2 ^2 ^*2 
^3 C 3' 

Each ‘element’ of this set has its position fixed in the set 
by the numbers of horizontal and of vertical steps required 
to reach it from the initial element a 1 - Thus the set is 
extended in two directions, while the set which determines 
a vector is extended in one direction only. This applies to 
a determinant with any number m 2 of elements. 


8 


Introduction 


In the same way we might have a set extended in 3 direc- 
tions, the symbols being written along the edges of a cube 
and along lines inside the cube or on its faces ; and we 
could, in theory, increase the number of ‘ directions ’ to 
4 or more, by proper convention as to the order in which 
the elements are to be taken. On the other hand, a single 
quantity — what in the language of vectors is called a scalar 
— may be regarded as a set not extended in any direction. 

The tensor calculus, using the word 1 tensor ’ in its broad 
sense, deals with all these different kinds of sets, in relation 
to sets of variables by which we can regard axes of refer- 
ence as being determined. In the narrower sense in which 
the word is used in reference to the theory of relativity, 
only sets which satisfy certain conditions are called tensors. 

In this book I have treated the tensor calculus as arising 
out of the use of determinants. Chapters I-IV deal with 
the elementary theory of determinants, so far as it is 
required for our purpose. (The student who is familiar 
with determinants can skip these chapters.) In Chapter V 
the tensor notation is introduced in successive steps, with 
explanatory remarks. These latter are in small print, not 
because they are less important, but in order not to break 
the continuity of the chapter as a whole. In Chapter VI 
these explanatory paragraphs (or parts of them) are brought 
together and amplified so as to give a general idea of the 
elementary properties of sets. Chapters VII and VIIT 
deal with some developments of the subject in its 
general aspect. Chapter IX shows the application of the 
methods to certain problems in the theory of statistics and 
of error ; this can be omitted by any one who wishes to 
pass on to Chapter X, which deals very briefly with the 
tensor in its more limited sense, as applied to the theory 
of relativity. 


DETERMINANTS 


I. ORIGIN OF DETERMINANTS 


I. l. Solution of simultaneous equations.— Deter- 
minants ordinarily arise out of the solution of simultaneous 
equations. Suppose we have two equations 
5x + 2y = 19 7 
4;c+ 3y = 18 ) 

Then, if we used only elementary methods, we could 
multiply the first by 3 and the second by 2, which would 
give 

15a-’ + 6y = 57 7 
8a; + 6 y = 36 j ’ 

and thence, by subtracting, w T e should have 
7x = 21, x = 3, 

whence either equation would give 

^= 2 . 

Similarly, if we had three equations 

2 x -f- 5 y -4- 3 z — 4 \ 

x—3i/—2z= — 1 , 

-Sx-4 y+ z = 7j 

we could, by eliminating 2 between the first and the 
second and between the second and the third, obtain 
lx+y= 5) 

— 9a;— 1 ly = 13)’ 

whence, proceeding as before, we should obtain 

* = E y = -2, 2=4. 

This process of successive elimination is tedious, especially 
when there are more than two unknowns ; and it is found 


12 Solution of simultaneous equations I. 1 

better to obtain a formula for the general solution and 
apply it to the numerical values of the particular case. 

Since such an equation as x—2y—2z = —1 can be 
written in the form (+ l)x + ( — 3)y + ( — 2)z = (—1), we 
can use positive signs throughout, it being understood that 
the quantity represented by any symbol may be either 
positive or negative. 


I. 2. Formula for solution.— (i) For completeness we 
begin with one unknown. The equation 


gives 


a x x = k x 



(ii) For two unknowns the equations may be written 

a i x + btf = *i) 
a 2 x + b t y = k 2 i ' 

Multiplying the first equation by b 2 and the second by b u 
and subtracting, we get 

{aft- a A) x = k A~h b n 

whence 

r - k A~KK , 

a ft~ a i b \ 

Similarly, interchanging a s and b’ s, 

k,a „ — k„a, 

y — — — - — 

b 1 a 2 — b 2 a 1 

_ a l k. 1 — a. i k x 
a 1 b., —a 2 b l 


It should be noticed that the expressions for x and for y 
have the same denominator, and that the numerators are 


1. 3 (i) Formula for solution 13 

obtained from the denominator by replacing 1 the as in the 
one case, and the b’s in the other, by k’s. 

(iii) Next take the case of three unknowns. Let the 
equations be 

a 1 x + b 1 y + c 1 z — k x j 
a.,x+b 2 y + c 2 z = k 2 j-. 
u- i x + b & y + c. i z = k 3 j 

Eliminating z from the first two equations, we obtain 
K c^-a^x + {\c 2 -b 2Cl )y = k^-k^. 
Similarly from the second and third equations 

(a 2 c 3 - a 3 c 2 )x + (b 2 c, - b 3 c 2 )y = k 2 c 3 - Lc., . 

Then, eliminating y from these equations, we get 

x i c 2 ko c i) ( b 2 c ?, b A c -i) G'oCt — ( b i c 2 — b 2 c i) 

~ ( a i <b ~ a 2 C l) ( V 3 ~ b 3 C 2 ) - ( a 2 C 3 ~ a 3 C 2) ( Vb ~ Vl) ' 

As before, the numerator is got from the denominator 
by replacing a’s by k’s, and w T e therefore need only consider 
the denominator. Multiplying out, it becomes 
a ib 2 c 2 c 3 — a fyl — + a 2 b 3 c 1 e 2 — a 2 b 1 c 2 c 3 + a 2 b 2 c 1 c 3 

+ a 3 b 1 cl — a. d b 2 c 1 c 2 

= C 2 K K C 3- a i b 3 C 2 — a 2 b l C 3 + a 2 b 3 C l+ a 3 b l C 2 — a 3 b 2 C l)- 

Hence, replacing the as by k’s for the numerator, 
x _ ^'i ,J 2 c 3 — b 3 c 2 — k 2 b^c 3 + k 2 b. l c l + k 3 b x c,, — k 3 b 2 c l 
a l lj 2 C 3~ a l li 3 C l — a 2 b l C 3 + a 2 b i C l + a i b l C 2~ a 3 b 2 C l ' 
Corresponding expressions can be obtained for y and for z. 

L 3. General problem. — (i) We might proceed in the 
same w r ay for equations in four or more unknowns. But 
this would mean that each case would have to be considered 
separately ; and not only should we fail to get a general 
formula, but the algebraical work would soon become 
practically impossible. We therefore alter our tactics. 


14 


General problem 


I. 3 (i) 


We write down the general equations involving m un- 
knowns x, y, z,...w 

a 1 x + 6 1 y + c t z + ... +f 1 w = k\ 

a 2 x+b 2 y + c 2 z+ ...+/> = /c 2 I (x. 3 . A j 

a m x + b m 2 / + c m z + --+fm w = l: m , 
guess at a solution, and then verify that this solution does 
actually satisfy the equations. 

(ii) The values of x,y, z...w as found from these equa- 
tions will be in the form of fractions. We will consider 
first the denominators. Putting together the results ob- 
tained in § 2, for the cases of m— 1, 2, 3, we find that the 
successive denominators, which we will call 1 IK 1 ), M 2 \ If 3 ), 
are 

m) = a. 


IP 2 ) = Uji., — a 2 b 1 

If*) = a ].l > 2 c 3 — ail/ s c 2 — a 2 b 1 c 3 + a 2 b a ci+a 3 b 1 c 2 

~ a A c i 


(I. 3. 1) 


We want to obtain an expression D ,m ), which we should 
guess to be the common denominator in the solution of 
the equations (I. 3. A), and of which ZH 1 ), I'D, 1 A 3 ) shall 
be the particular cases for m = 1, 2, 3. 

(iii) The three If s in (I. 3. 1 ) have a general similarity, 
which enables us to obtain a formula for Zd m h It will be 
seen that both in 1 A 2 ) and in If 3 ) some of the terms have 
sign + and some have sign — . We will see first how 
the terms are constructed, and then consider the question 
of sign. 

I. 4. Construction of terms.— (i) For each of the three 
If s, for which the values of m are 1 , 2 , 3 respectively, each 
term is the product of m factors, which are the coefficients 
in the equations in § 2 . In writing down these coefficients, 


1. 4 (iii) Construction of terms 15 

it is convenient to keep them in the relative positions in 
which they occur in the equations. Thus we get 

For ZW For ZK 2 ) For 1 )(*> 

a \ H b i o, c, 

b 2 a 2 b l C 2 

^3 ^3 ^3 

In each case we have a set of quantities arranged in the 
form of a square. The individual quantities are called 
the elements of the set ; the quantities in a vertical line 
constitute a column, and the quantities in a horizontal 
line constitute a row. The columns are numbered from 
the left, and the rows from the top. The diagonal drawn 
from the top left-hand corner — i. e. the diagonal through 
a \ — is called the leading diagonal. 

(ii) Each term contains m factors, which are taken from 
the set in such a way that one (only) shall come from each 
column and that one (only) shall come from each row. Also 

contains every term which can be constructed in this 
way. Take, for example, IX 3 ). Since one factor is to come 
from each column, the factors are an a, a b, and a c. The 
a can be either or a 2 or a. it i. e. it can be taken in three 
ways ; when one of these three a’s has been taken, one row 
has been used up, and the b can only be taken in two ways; 
and, when one of the two b’s has been taken, the c can 
similarly only be taken in one way. There are therefore 
3.2.1 =6 possible combinations of factors ; and this is 
the number of terms in IF 3 ). 

(iii) Another way of stating the thing is that, if we 
keep to a fixed order a b c of the factors in each term, 
the suffixes of the factors are the numbers 1 or 1 2 or 1 2 3, 
arranged in different ways, and there is one term for each 
of the possible arrangements. 


16 


I. 4 (iv) 


Construction of terms 

(iv) We conclude that B contains terms, each of which 
is constructed by taking m factors from the set of m x m = m 2 
quantities 

«1 K '•!•••/] 
f, 2 ^2 ( ’_2 • ' ' J 2 

^ m c m • • • fm 

in such a way that there shall be one factor (only) taken 
from each column and one (only) from each row ; there 
being one term for each of the m (m — 1) ... 1 = «? ! ways 
in which this can be done. Or, which comes to the same 
thing, that the terms are made up of factors abe ...f with 
suffixes 12 3 . . . m. arranged in different ways, there being 
one term for each of the w ! possible different arrangements. 

I. 5 . Rule of signs. — (i) It will be seen that, in the B's 
after ZH 1 ), half of the terms are positive and half negative, 
and that in each case the term found from the elements in 
the leading diagonal — namely or af 2 or (tf 2 c t — P os i* 
tive. We should therefore expect that half of the terms 
in 2K"0 would be positive and half negative, the term 
a x h 2 c 3 . . .f m — which we call the leading term and usually 
place first — being positive. The difficult question is that 
of signs. In the case of (say) m — 5 , how are we to know 
whether such a term as af 5 c.f A e x is to have the sign + 
or — ? 

(ii) The sign of a term must, if the letters a b e...f are 
kept in their original order, depend on the arrangement of 
the suffixes, i. e. on the extent to which this has departed 
from the initial arrangement 1 2 3... m. Now any arrange- 
ment such as 35241 can be got from the initial arrangement 
12345 by a series of interchanges of adjacent figures. We 
must fix a definite order in which these interchanges are to 


I. 5 (iii) 


17 


Rule of signs 


be made. We therefore say that each figure is to be moved in 
turn, beginning with that which ultimately comes first, then 
that which ultimately comes second, and so on. Thus in this 
particular case the successive stages would (repeating for each 
group of interchanges the arrangement from which we start) 
be 12345, 13245, 31245; 1245,1254,1524,5124; 124,214; 
14, 41 ; 1 : a total of seven interchanges. Now let us look 
at the signs in 1 and In M 2 ) the arrangement 21 

is obtained from 12 by 1 interchange, and the sign for 21 
is — . In DW the signs of the successive terms, and the 
stages by which the final arrangements of suffixes are 
obtained, are as follows, the numbers of interchanges being 
added in heavy type : 


+ 

123 .... 

0 

+ 

— 

123, 132 . 

1 

4- 

— 

123, 213 . 

. 1 

— 


123, 213, 231 

123, 132, 312 

123, 132, 312, 321 . . 


2 

2 

3 


It will be seen that both for and for I)W the sisrn is 

o 

— or + according as the number of interchanges is odd 
or even. We therefore adopt this as our rule ; in the case 
of a 3 b 5 c„d 4 ej , for instance, seven interchanges are necessary, 
and the sign is therefore — . 

(iii) In order to find the sign of any given term by 
the above rule, it would be necessary to perform all 
the interchanges. A shorter method is to look at the 
term as it stands and to consider the reversals of order in 
it ; i. e. taking the suffixes of the term in pairs in every 
possible way without altering their order, to see in how 
many cases the numbers are in the reverse of their order in 
the leading term, i.e. are in descending instead of ascending 
order. The term , for instance, gives the following 

pairs, those in which the order is reversed being printed 
in heavier type:— 35, 32, 34, 31, 52, 54, 51, 24, 21, 41. 


18 


Rule of signs I. 5 (iii) 

It is easily seen that each interchange, of the kind described 
in (ii) above, produces one reversal of order ; for, while we 
are shifting one number, such as the 5 in the second group 
of arrangements there shown, the relative order of the 
other numbers remains unaltered. It follows that the 
number of reversals of order is the same as the number 
of interchanges of this kind ; and therefore the sign of 
a term will be — or + according as the number of reversals 
of order is odd or even. 

(iv) The interchange of any two suffixes in a term changes 
the sign of the term. 

[Let the two suffixes be <j> and ; (f> coming before 4? in the 
term in question, but not necessarily being before it in numerical 
order. 

(1) First let and be adjacent. Then the interchange of 
(j> and \fs increases or decreases the number of reversals of order 
by 1, and therefore changes the sign of the term. 

(2) Next suppose that there are x suffixes between <j> and 4'- 
Then we can move 4' in front of (f> by x + 1 interchanges with 
the adjacent term, and then move (j> into the original position 
of 4' by x interchanges. This is a total of 2 x + 1 interchanges, 
each of which in succession makes a change of sign : the total 
result is to change the sign of the term.] 

(v) We have so far assumed that the factors of a term 
are arranged in the original order of the letters a bed.... 
Now suppose that the order of the factors is altered in any 
way. How does this affect the rule of signs ? 

The alteration of order can be brought about by a series 
of interchanges of factors. Suppose there is an interchange 
of a ^ and b^. Then, by (iv), the number of reversals of 
order of suffixes is altered by an odd number, but the 
number of reversals of order of letters is also altered by 
an odd number ; and therefore, if we consider the sum 


I. 5 (v) liule of signs 19 

of the numbers of reversals of order of letters and of 
suffixes, this sum either is not altered or is altered by an 
even number It follows that, if the factors of a term have 
been shifted about so that the letters abc . . . are not in their 
original order , the sign of the term depends on the sum of the 
numbers of reversals of order of letters and of suffixes re- 
spectively, being — or + according as this sum is odd or even. 
For example, in d i b 5 c 2 a 3 e 1 there are five reversals of order 
of the letters and eight of the suffixes, so that the sign is — . 


II. PROPERTIES OF DETERMINANTS 

II. 1. Definition of determinant. — We can now com- 
bine the results obtained in I. 4 and I. 5. We suppose that 
we are dealing with a set of m xm = m 2 quantities, which 
we can arrange in the form of a square, thus (the quantities 
being denoted by crosses) : 

X X X ... X 

X X X ... X 

X X X ... X 

Then the expression which we have to consider is the 
algebraical sum of a number of terms, of which some are 
taken positively and some negatively. Each term (apart 
from sign) is the product of m elements of the set, taken 
in such a way that one element (only) shall come from each 
column and that one element (only) shall come from 
each row ; and there are vi ! terms, corresponding to the 
in ! different ways in which this can be done. The leading 
term is the term containing the elements in the leading 
diagonal of the square, and has sign + . The signs of the 
other terms are to be found by replacing the elements of 
the set by a v c 2 , . . . b v etc., arranged as a key set : 

a i l i c i •■•/i 

C 2 ^2 ^2 • •• ./ 2 
a ni b m 

The sign of a term is then — or + according as the sum of 
the numbers of reversals of order of the letters and of the 
suffixes, as compared with the leading term a l b 2 c z . . ,f m , 
is odd or even. 


21 


n. 2 (i) Definition of determinant 

The algebraical sum of the terms so obtained, namely 
a i ■ • •fm + l 'tc., is called the determinant 

of the set, and will be denoted by D. It should however 
be observed that it is the determinant of the set as so 
arranged ; with different arrangements of the elements of 
a set, still keeping them in a square, we may obtain 
different determinants. 

We can therefore define the determinant as the alge- 
braical sum of terms of the form, a frf r • • •> where p gr . . , 
are the numbers 1 2 3 ... m arranged in some order , there 
being a term for each of the m ! possible orders, and the sign 
prefixed to the term being + for the natural order 1 2 3 ... w 
and — or + for other orders according as the number of 
reversals of natural order is odd or even. 

The symbol for the determinant is constructed by placing 
single vertical lines before and after the set ; thus 

a i c i 

a 2 b 2 c 2 

a s b., c 3 

means the determinant afj„c z — etc., which we have called 

The terminology is the same as is given in I. 4 and I. 5 
for a set. The quantities between the vertical lines are 
the elements of the determinant. Those in a vertical line 
are a column ; those in a horizontal line are a row. The 
leading diagonal is the diagonal drawn from the top left- 
hand corner ; and the leading term is that containing the 
elements through which the leading diagonal passes. The 
leading term, as already stated, is taken positively. 

If the symbol for a determinant contains m columns and 
m rows, the determinant is said to be of the wth order. 

II. 2. Elementary properties.— (i) From the mode of 


.22 Elementary properties II. 2 (i) 

construction it follows that each element of the determinant 
appears in (m— 1) ! out of the m ! terms ; and there are no 
two terms having more than m~ 2 factors alike. 

(ii) If each element of a column or of a row is 0, the 
determinant is = 0. [For each term contains one of these 
elements as a factor.] 

II. 3. Properties depending on the rule of signs. — 

(i) The value of a determinant is not altered by making the columns 


the 

rows 

columns 

e.g*. 

for 

m = 

4 , 



h 

t'i d l 

= 


«2 

a 3 

«4 


K 

c 2 d 2 


h 

b 2 

h 

K 

a 3 

K 

C 3 ^3 


! h 

C 2 

c z 

c i 

«4 

h 

'4 ^4 



d 2 


c h 


[Let D be the determinant, and It the new determinant obtained 
by making columns rows and rows columns in the symbol for D. 
Then, apart from sign, D and R obviously have the same m ! terms. 
We have therefore only to consider signs. The two determinants 
have the same leading term, which is positive in both. Let t be 
any other term of D, say a 3 i) ( c 2 d, ... . Then t also occurs in R, 
but, since the terms of R must be constructed according to the 
system prescribed in our definition of a determinant, the factors 
of t in R will be arranged in the numerical order of the suffixes, 
namely d l c i a z b t .... The sign of t in D depends on the number 
of reversals of order in the suffixes 3 4 2 1. . ., and the sign of 
t in R depends on the number of reversals of order in the letters 
deal) . . . . But each of these numbers is the sum of the numbers 
of reversals of order of letters and of suffixes as compared with the 
original orders abed . . . and 1 2 3 4.. . ; and, by I. 5 (v), these 
sums are either both odd or both even. It follows that the sign of 
t is the same in both determinants. This is true for each term 
of D or R ; and the two determinants are therefore equal.] 

If two determinants correspond so that the columns 
of one are the rows of the other, each determinant is said 
to be the transposed of the other. 


II. 3 (v) Properties depending on the rule of signs 23 


(ii) It follows from (i) that any statement as to columns 
or rows is equally true for rows or columns. We shall 
indicate this, for conciseness, by ‘column [row]’ or ‘row 
[column] 

(iii) If any two columns [rows] of a determhiant are inter- 
changed, the absolute magnitude of the determinant remains 
unaltered , but its sign is changed. 

[Suppose, e. g., that we interchange the b’s and the e’s. Let 
and ^ be any two suffixes. Then, in the original determinant, 
corresponding to any term which contains b ^ and there is 
another term exactly similar except that the factors are b ^ and e ^ ; 
and these two terms, by I. 5 (iv), are of opposite sign. The effect 
of interchanging the b s and the e's is that the two terms are inter- 
changed, i. e. the sign of each is changed. This applies to every 
such pair of terms.] 


(* v ) V an y two columns [rows] of a determinant are iden- 
tical, the determinant is = 0. 

[We can see this in either of two ways. 

(1) Consider a pair of terms such as are mentioned in (iii). 
The one contains b ^ and e ^ ; the other is exactly similar, except 
that it contains b ^ and ; and the two terms have opposite 
signs. It b t p = e,p and b ^ = e^, the two terms cancel. The whole 
determinant is made up of such pairs. 

(2) More briefly, suppose we interchange the two columns 
which are identical. Then the determinant remains unaltered. 
But, by (iii), its sign is changed. This can only be the case if 
the determinant is 0.] 


(v) Since columns and rows may he interchanged, a 
determinant is sometimes represented by its leading diago- 
nal alone, if this indicates a system for insertion of the 
remaining elements. The notation is 


a \ b Z C l 


■fm 


«1 l\ c x ...f 
a i b i <'z ■■■fz 
C> ? 1 '? 


■■•fm 


24 Properties depending on the rule of signs II. 3 (v) 

it is then immaterial, so far as the value of the determinant 
is concerned, whether we enter the as as a column or 
as a row. But it should he mentioned that the relative 
arrangement of columns and rows is of importance later od, 
when we come to consider properties of sets of quantities. 


II. 4. Cofactors and minors. — (i) In the complete ex- 
pression for I), each term contains one a, which is either 
or a 2 or a 3 etc. We can therefore group the terms accord- 
ing to the a’s they contain. In the case of I)^\ for 
instance, 






= MV3-V2) + a a(- Vs + V 1) + a 3 (V2 

Suppose that the terms of D are grouped in this way ; and 
let the resulting coefficients of a 1 a 2 a 3 . . . a nl be denoted 
by A 1 A , .-/ ;i . . . A m . Then 

Jj = a 1 A 1 + a„A 2 + a 3 A 3 + ... +a m A m . 

Similarly, if we group the terms according to the l/’s or 
c’s etc. they contain and denote the coefficients of the V s 
or c’s etc. by B l B 2 B z . . . B m or C x C., C 3 . . . C m etc., we 
shall have 

Jj = b l B 1 +b 2 B 2 + b 3 B 3 + ... +b M B m , 

]j = Cjt’j + c 2 C 2 + c 3 C 3 + ... + c m C m , 
etc. 


The A’s, B’ s, etc., are called the cofactors of the correspond- 
ing elements of the determinant ; thus the cofactor of b 3 is 
B 3 , where b 3 B.. is the sum of all the terms which contain b 3 . 


(ii) The terms which contain the leading element a 1 are 
obtained from the leading term a 1 l 2 c 3 . . .f by altering 


II. 4 (iii) 


Cofactors and minors 


25 


the suffixes, and prefixing the proper sign to the term, in 
the manner already described, with the proviso that the 
factor a x remains unaltered. But this process will give us 
the products, by a v of the terms so constructed from 
a leading term b 2 c 2 . . ,f m . In other words, the cofactor 
of a, is the determinant 


b„ 

K 


In for example, it is 


<"2 •■•/a 

e ? ■■■A 


b 2 c 3 -fj,c 2 = 


(iii) The determinant which is obtained from 1) by 
striking out the column and the row which contain any 
element of the determinant is called the minor of that 
element in the determinant. 

We see from (ii) that 


A 1 = cofactor of «q = minor of a v 
We might show in the same way that the cofactor of 
any other element, say c v is equal to the minor of that 
element, with the sign — or + prefixed according to the 
position of the element in the determinant : but it is 
simpler to find the cofactor by bringing the element into 
the position of a v Let the element be in the ^th column 
and the rth row. Then we can make it the leading- 
element, without altering* the order of the other columns 
or rows, by means of q — 1 interchanges of its column with 
an adjoining column and r— 1 interchanges of its row with 
an adjoining- row. Each of these interchanges, by II. 3 (iii), 
multiplies the determinant by —1 ; and the total result is 
to multiply by — 1 or by + 1 according as q + r— 2 is odd 


26 


Cofactors and minors 


IT. 4 (iii) 


or even. Having got the element into the position of the 
leading element, we strike out the first column and the first 
row ; the result, apart from the prefixed sign, is still to give 
the minor of the element, since the relative positions of the 
other columns and rows are unaltered. Hence the cofactor 
of any element is equal to its minor with the sign — or + pre- 
fixed according as the number of steps from the leading element 
to this element is odd or even ; it being understood that each 
step is either horizontally from one column to the next or 
vertically from one row to the next. For example, 

A 3 — + minor of a 3 , 

C 4 = —minor of c 4 , 
etc. 


(iv) We have found in (i) that 

flj A j + a 2 A 2 + a 3 A 3 + . . . + a m A m — D | 

Mi + KB 2 + b z B 3 +...+b m B m =I)\. (II. 4.1) 

etc. | 

We have now to find the value of such sums as 

a l B 1 + a 2 B 2 + a 3 B 3 + ... +a m B m , 

A\ + b 2 A 2 +b 3 A 3 + ... + b m A m , 
c 1 A 1 +c 2 A 2 +c 3 A 3 + ... + c m A m , 
etc. 

Let us take the second and third of these as examples, 
but replace the b’s or the c’s by 0's. Then we want to 
find the value of 

e i A 1 + 0 2 A 2 + 6 3 A 3 + ... + 8 m A m . 

Now we see from (II. 4. 1) that this is the value of the 
determinant 

c \ • • • f\ ; 

^2 ^2 C 2 f‘l 

^3 ^3 C 3 ■■■ fi 

8-m ^ m c m ••• fa 


II. 4 (v) 


Cofactors and minors 


27 


for the cofactors of 0 X d 2 d 3 . . . 8 rn in this determinant are 


the same as the cofactors of a 1 


in D, i.e. are 


A 3 . . . A m . Let us replace 6 throughout this deter- 
minant by any letter, other than a, occurring in D, e.g. 
by c. Then the determinant becomes 


K 

K 

h 


b. 


L \ ■■•A 
c 2 • ■■/> 
c 3 — A 

t - in " ■ J m 


But this is a determinant which has two columns identi- 
cal, and its value, by § 3 (iv), is 0. Hence 

c \ A + r 2 A + c :i A + ... + c m A m = 0^ 

Similarly b 1 A l +b s A 2 + b 3 A 3 + ... + 6 m A m = 0 
"i B i + a 2 X 2 + a 3 B 3 +...+ a m B m = 0 
etc. 


(II. 4. 2) 


(v) Now let us interchange the columns with the rows, 
so that the determinant becomes 

°i a 2 a a — a m 
h \ h 2 b 3—K 
e t c ;2 ^ ? — e m 

A A A ■■■fra 

Then, by § 3 (i), the value of the new determinant is the 
same as that of the old, i.e. is 1). Also the minor of any 
element e in the new determinant is the same as its minor 
in the old determinant, but with columns and rows inter- 
changed, so that its value is unaltered ; and the number of 
steps from aq to e is the same in both determinants. The 
cofactor oi e in the new determinant is therefore equal to 
its cofactor in the old determinant. Hence by applying 


28 Cofactors and minors II. 4 (v) 

(II. 4. 1) and (II. 4. 2) to the new determinant we get 
new sets of relations, namely 


and 


a \ f + h x 7ij + Cj C, + . . . + f -Fj — I) 
a 2^2 "h ^2 ^2 4 C 2 ^2 "h ■ • • + f l i'z — dJ 
etc. 


(II. 4. 3) 


a 2 A 1 + b ?i B x + c,C^ + ... + f I\ — 01 
d 2 Ai-\- b 2 S 1 + r i C 1 f 2 l' j — 0 J- . (II. 4. 4) 
etc. j 


(vi) If all the elements in a column [row], except one, 
are 0, the determinant is equal to the product of that one 
by its cofactor. 


III. SOLUTION OF SIMULTANEOUS 
EQUATIONS 


III. l. Statement of previous results. — We have 
next to consider the solution of the simultaneous equations 
(I. 3. A). Before we do this, it will he convenient to 
express in determinant form the results obtained in I. 2. 
These results are as follows ; 


(1) If Cj* = /'j, then * x — 1 1\ | -f | a 1 J . 


(2) If 


aq x + /q ij — tq 
a.,x + b 2 y 


NT 


then 
x — 


then 


\ K I - # S) , 

k„ 


where 


ZB 3 ) 


« 2 

a.. 


k 

1 h 14- 

I a 

N ! . y- 


K 

•f 

a i h 

k 

2 b 2 

\ °2 \ 

u„ 

h 


"2 h 

(3) If 

(q,j; + /q y + t\2 = 

K\ 







x + b t y + <\z - 

K\ 

5 





a.,X + 6.J + r i ; = 

h\ 





*, >h 


+ m y = 

«. 

h 

c i 

4- W‘\ 


K K 

Co 


a 2 

i a 

Co 



h K 

c a 


a z 


c i 



* Here, as elsewhere, vertical lines denote a determinant, not 
‘ absolute value 


30 


General solution 


III. 2 


III. 2. General solution.— The general equations of 
which we require a solution are those set out in (I. 3. A), 
namely: 

ff i®+ b xV+ V+---+ /iw = ^i ) 

o 2 x+ b 2 y + c 2 ? + ... + j\w = /- 2 ( ( m 2_ 

V + ^ + V+...+/ m ?o = 

The form of the solution is suggested by the results 
given above. Multiplying the successive equations by 
A x , A 2 ,...A m , and adding, we have 
(a l A l + a 2 A. 1 + ... + a m A m )x 
+ C J 1 A X + 6 2 A 2 + ...+ b m Ajy 
+ (c l A 1 +c 2 A 2 + ...+ c m A m ) z 
+ ... 

+ (fi + f*.A 2 + . . . + f m A m ) w = k x A x + k 2 A 2 + ... + k m A m . 
By (II. 4. 1) and (II. 4. 2) the coefficient of x is equal 
to D, and those of y, z,...w are equal to 0. Also the 
expression on the right-hand side is what 1) would become 
if we replaced the a s bv Z-’s. Hence 


X — 

k l A l +k 2 A 2 + ... +k m 

A m )~l) 

— 

X 'i /j i r i ■■■A 

-r 

°\ 6 i r i ■■■fi 

• ' 


h K c : < ■•■/■> 


a i b y 



b m c in ' ■ m 


a m b m c m-’-./m 


Similarly 



y = 

a \ <q ...f 

q- 

a l h \ c \ ■■■fi 

, 


‘[■i <? •••fi 


f b, i c ? ...f 2 



a m c m •••fm 


a m b m c m •••fm 



and so on. ' 


If, for verification, we substitute these values in the 
original equations, it will be found that the relations 
(II. 4. 3) and (II. 4. 4) come into play. 


IV. PROPERTIES OF DETERMINANTS 

(continued) 

IV. 1. Sum of determinants.— If two determinants 
are identical except as regards one column [row], their sum 
is a similar determinant in which the elements of that 
column [row] are the sums of corresponding elements in 
the two determinants. [For example 


a \ b l c \ 

+ 

l h K 

= 

a i+ d i K c i 

^2 ^2 ^*2 


K 1*2 


a 2 + ^2 ^2 C 2 

^3 ^3 ^3 


c h b., C 3 


a 3 + d 3 C 3 


since it is = (a^A x +a 2 A 2 + a. s A 3 ) + (d 1 A l +d i A 2 + d 3 A 3 ) 

= (a l + d l )A 1 + (u 2 + d 2 )A 2 + (a 3 + d 3 )A s .\ 

IV. 2. Multiplication of determinant by a single 

factor. — If each element of a column [row] is multiplied 
by the same factor, the determinant is multiplied by that 
factor. [For example 


Afl, 

h 

c x 

— A -f- \ ci ^ A. 2 "H Artg A £ — A 

a i 

b \ c 'l 

A « 2 

K 

C 2 


U 2 

h 2 C 2 

Aa 3 

h 

C 3 


a 3 

K C 3 


IV. 3. Alteration of column or row. — If the elements 
of a column [row] are multiplied by a single factor and added 
to the corresponding elements of another column [row], the 
value of the determinant is not altered. [For example 


flj + ACj 



— 

*1 

h 

c i 

+ A 

c i 


c 1 = 

°1 

b x fj 

a 2 + Ac 2 

b 2 

C 2 


a 2 

bo 

C„ 


c„ 

K 

C 2 

<J n 

b 2 c 2 

« 3 + Ac 3 

h 

C 3 


a 3 

h 

c i 



h 

C 3 ! 

a 3 

b 3 C i 


32 


Calculation of determinant 


IV. 4 (i) 


IV. 4 . Calculation of determinant. — (i) There are two 
main methods for calculation of a numerical determinant. 
(a) When m is small, we can use the formula 

J) = t/j J i l + « 2 A,, + ... + a m A m , 
repeating 1 the process as often as may be necessary. Thus, if 


JD = 


then 71=3 

1 -3 

-5 

-2 7 

+ 4 

-2 7 1 


6 1 


6 1 


1 -3 j 


= 3. 19-5(-44) + 4(-l) = 273. 

( b ) V hen m is large, we can reduce the determinant 
to one of order m — 1 by means of § 3 and II. 4 (vi). 
Applying this method to the above example, we could 
multiply the first row by f and subtract from the second, 
and also multiply it by § and subtract from the third. 
To avoid fractions, we multiply I) twice by 3. Then 

9 1 ) = 


3 

-2 

7 

= 

3 

-2 

7 

15 

3 

-9 


0 

13 

-44 

12 18 

o 

a 


0 

20 

-25 


= 3 x 


D = 273. 


13 -44 

20 -25 


= 3 ( — 325+1144)= 2,457. 


(ii) For algebraical determinants various devices have to 
be used. An important determinant is 


I) 


(,m - 1 /jin - 1 c m ~ 1 e m ~ 1 +'t)i-\ 
q! rt-2 2 (tta - 2 fJ in - 2 f'ui - 2 

a h r f 

1 1 1 ...1 1 


33 


IV. 5 (i) Calculation of determinant 

This is =0 if a = b or if a — c, etc. Hence it contains 
a — b, a —c , ... a — e, a—f as factors. Similarly it con- 
tains b — c, ...b — e, b—f; and so on. Looking to the 
leading term, it will be seen that there can be no other 
factors; i.e. 


D = {a-b){a-c)...{a-e){a-f).(b-c)...{b-e){b-f)...{e-f). 


[Example. — Hence prove that 


b m ~' 



/"‘- 1 

b m - 2 

c‘ u ~ 2 


/"*-* 

d+i 

rr + 1 

...e r+ ' 

/ v+1 

b r ~ l 

C r ~ 1 .. . 

e r ~ i 

/;-■ 

1 

i ... 

...i 

i 


= ( — p/'t-r+i .ff.. 

y ’ <K«) 


D 


x coefficient of a r in$ (a), 


where 

4>(a)=(a-b) (a —c). ..(<* — «) ( a-f ).] 


IV. 5. Product of determinants.— (i) The product of 
a determinant of order m and a determinant of order n can 
be expressed as a determinant of order m + n by placing 
the leading diagonals in line and filling in with 0’s. For 
example 


«i 

K 

c i 

II 

X 


4 i 

e l 

0 

0 

a 2 

b 2 

Co 

1 f 

a > 

K 

C S 

0 

0 

a z 

K 

C„ 


a.. 

b. 


0 

0 





0 

0 

0 

( h 

*4 





0 

0 

0 


e o 


[For the only terms of this latter determinant which are 
not 0 are those for which the first three factors (collectively) 
are taken from the first three columns and rows and the 
next two factors (collectively) from the last two columns 
and rows ; and in each such ease the first three factors 
form a term of the first determinant and the next two 
factors form a term of the second determinant. Thus all 


34 


Product of determinants IV. 5 (i) 

the terms of the product of the one expanded determinant 
by the other are accounted for, and there are no others.] 

(ii) We have now to show that the product of two 
determinants, each of order m, can be expressed as a 
determinant of order m. To obtain the general formula., 
it will be sufficient to take a particular case, e. g. 


J) = 

a l 

h 

c i 

, E = 

“l 

Pi 

7i 


a 2 

K 

Co 


°2 

02 

y-i 


a 3 

^3 

e 3 


“ 3 

ft 3 

73 


provided that in our reasoning we retain m as the order of 
each determinant. 

By means of the first sentence of (i) we can write down 
I)E as a determinant of order 2 m, i.e. 


«i 

b l 

C 1 

0 

0 

0 

a 2 

K 

C 2 

0 

0 

0 

°3 

K 

(> 3 

0 

0 

0 

0 

0 

0 

a l 

Z 3 ! 

7i 

0 

0 

0 

a.. 


■ 72 

0 

0 

0 

a 3 

Pi 

7s 


Two of the quarters of this determinant contain 0's only ; 
and it will be seen, from the method of forming those 
terms of the determinant which do not contain 0 as a factor, 
that we can fill in either of these quarters in any way we 
like, provided we leave the other quarter alone. Also, in 
order to reduce the determinant from order 2 m to order rn, 
we ought to get m l’s in the leading diagonal. We therefore 
shift the last m columns to be the first w,, and replace 
the first m 0’s in the new leading diagonal by l’s. The 
first process involves m 2 interchanges : we can avoid change 
of sign of the determinant, in the case where rn is odd, by 
changing the signs of the first m rows (before inserting 


IV. 5 (ii) Product of determinants 35 

the l’s), whether m is even or odd. Then, inserting the 
l’s, we get 


1)E = 


1 

0 

0 

“l 

a 2 

a.. 


0 0 — — b x 

1 () -a.-, -b, z 

0 1 ~«3 ~K 

di yi o o 

do y- t o o 

d, y 3 0 0 


0 

0 


We now reduce each of the elements in the right-hand 
top quarter to 0 by means of § 3 ; i.e. we add times 
the 1st column to the (m + 1 ) th (in this case the 4th), thus 
getting 



1 

0 

0 

*05 

1 

o 

- (, 1 

) 


0 

1 

0 

— a. 2 —Ij 2 

~ c 2 



0 

0 

i 

-“‘l 


- C 3 



°i 

d\ 

y\ 

0 i a \ 

0 

0 



a.. 

d 2 

y 2 

d l a. z 

0 

0 



«3 

A, 

y-s 

a i a a 

0 

0 


then do the same 

with 

a., times the 

2nd 

column and 


a 3 times the 3rd . . . , and then deal in the same way 
with the (m 4- 2)th and (w+3)th... columns. We get 
finally 


1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

u i 

d i yi 

«i<'i + «20i + a s yi 

t>i a i + \di + hyi C i a i + Ct8i + Csyi 

a 2 

di y 2 

Oj a 2 + a 2 ^2 + n 3V2 

b 1 a 2 + b 2 ^ 2 + b 3 y 1 c l a a + c 2 d 2 + c 3 y 2 

a 3 

ft Vs 

ff , n 3 + (t , ji 3 + C- 3 y 3 

b 1 a 3 + b 2 3 3 + b 3 y 3 C l n 3 + C 2 /3 3 +C 3 y 3 


By the same reasoning as that employed at the beginning 
of this paragraph, we can replace each of the elements in 
the left-hand bottom quarter of the above determinant 
by 0. Hence, if the determinant formed by the elements 


36 Product of determinants IV. 3 (ii) 


in the lower right-hand quarter of the above is called P, we 
have, by (i), 





PE 

= 

1 0 j xP= P 






0 1 0 i 






O 

O 

. e. 







b, c, 

X 

n i 

ft 

1 

tt 2 

b,_ c 2 


"3 

ft 

72 

«3 

ft r 3 


°3 

ft 

73 1 

= | 


ft + ( h 7i 

ft a i + ft ji j + ft y, C, a, + c 2 ft 


i a 2 + fl 2 /3 2 -f ff 3 *y 2 «o + fr 2 /3 2 "t" ^3^2 n o 4- c 2 /^2 4- C 3 y 2 

| n.aj + aj/Sj + Ojyj V 3 + & 2 ft + & 3 y 3 CjOj + c, 3 3 + c s y 3 


(IV. 5.1) 


The reasoning is quite general, and the product of two 
determinants of any order can be written down from the 
above. 

By interchanging columns and rows in one or other 
■ r both of the original determinants we get three other 
expressions. All four expressions, of course, are equal 
hen expanded: we shall take the above to be the 
st andard form. It is to be noticed that PE or I) x E 
means the product of I) and E , in this order ; by reversing 
the order of multiplication we get four other forms, but 
these are only the ‘ transposed ’ of the previous four. The 
eight forms are given, in the new notation (see V. 6 (vi)), 
in the Appendix (p. 122). They are based on the principle 
that in the standard form the element in the <jrth column 
and rth row of the product is formed in a particular way 
from the qth. column of the first determinant and the rth 
row of the second. 


IV. 6. The adjoint determinant.— (i) Let P' denote 
the determinant whose elements are the cofactors of the 
corresponding elements in the original determinant, i.e. 


IV. 6 (ii) 


The adjoint determinant 


L' = 


11, 


h ...b\ 

C F 

2 • • • J 2 


^ ia • i in 


37 

(IV. 6. A) 


-hi B n 

If we interchange columns and rows in 1)' , and then 
express the product 1)1)' as in (IV. 5. 1), we find that 
the elements in the principal diagonal of the product are 
a i A i+a 2 A 2 +--- a m A m > ^i + y 2 +... + SA, etc., each 
of which, by (II. 4. 1), is =it, while the other elements 
are a 1 B 1 + a 2 B 2 + ... + a m B m , b 1 A l + i 2 A 2 + ... + b m A m , etc., 
each of which, by (II. 4. 2), is = 0. Hence 


LI)' = 


I) 

0 


0 

I) 


0...0 

0...0 


= L v 


(IV. 6. 1) 


0 0 0...L 

and therefore 

If = L m ~\ (IV. 6. 2) 

The determinant 1/ was formerly called the reciprocal , but 
is now more usually called the adjoint, of L. It is not 
the true reciprocal of L, since the product of the two is 
not 1 but L"‘ (cf. V. 2). 

(ii) Let the cofactors of A 1) B 1 ,C 1 , ... F 1 in If be denoted 
by a u Pi>yi> ••• Ci- Then, applying (II. 4. 3) and (II. 4. 4) 
to L', we have 

A \ a i + It i d\ + f j yi + • • • + F\ Ci = lb' 

-l> a i + -Z> 2 /3i + C 2 y l + .. . + -f 2 Ci = 0 


A m a i~b C iu yi + ... + F m Ci — 0 

W e can regard these as equations for determining cq , Ai , 
Comparing them with 
J l a 1 + iljij + 6\Cj + ... + F 1 / 1 = L ) 

A 2 a j + B 2 b l + C'gCj + . . . + F2.f1 ~ ® 


A m a 1 T L in u j + + . . . + F m J "j — 0 


38 The adjoint determinant IV. 6 (ii) 

which are obtained from (II. 4. 3) and (II. 4. 4). we see that 
the solution is 

a '-~B a 1 = di - fj b \ = Tr"-' L b x , etc. 

Hence the cofactor of any element, of the adjoint determinant 
is equal to the corresponding element of the original determinant , 
multiplied by the ratio of the adjoint determinant to the original 
determinant. 


V. THE TENSOR NOTATION* 


v. l. Main properties of determinant.— (i) The 
notation so for used is the ordinary one for elementary work, 
for higher work we reduce the number of letters and make 
a moie liberal use of suffixes. We shall replace cc, y, z ... w 
by x v X 2 , x v ... x m ; a v b v c v ...J\ by d n , d n , d 31 , ...d 
and so on. Also, it being understood that the val™„ 
assignable to each of the letters q and r are 1, 2, 
we can use \cl qr \ and \d rq \ to denote the determinants 
whose elements in the ^th column and rth row are respec- 
tively f d qr and d rq , i. e . 

I < l qr I 


/ttl > 

ues 


d u 

d n 

^81 • 

ml 


d ]2 

d -2 o 

d 32 • 

■ d m 2 


d l m 

d 2m 

d 3 ill • 

•• d mm 


d U 


‘hi • 

■ c hm 




• 

■ d 2m 


d mi 

d 111 2 

d ml ■ 

• d m i/i 



I d rq ! = 


so that the interchangeability of columns and rows gives 
I tl qr\ = \d rq |. (V.l. 1) 


(V. l.A) 


(ii) The /• s in (III. 2. A) were supposed to be known 

' The paragraphs in small print may be omitted on first reading 
but should be read before Chapter VI is taken. 

t It is more usual to have the suffixes the other way round ; i. e. 
to use 11, 12, 13, ... l»n as suffixes for the first row. But my 
arrangement seems to follow more naturally from the a.b, ... 
notation, and I also find that it fits in better with the subsequent 
work. 1 


40 Main properties of determinant V. 1 (ii) 

quantities, so that the equations served to determine 
x,y, z, ... w. We shall have to consider the relations between 
the set of quantities which we have denoted by x,y,z, ... w 
and those which we have denoted by k v k 2 , k 3> ...k m . We 
therefore, in altering' x, y , z,...iv to A’j, X v X 3 ,...X m , also 
alter k v X 2 j k 3 , ... k m to dq, d 2 , d y , ... l m . 


(iii) The main properties which we have to consider are 
set out below. Definitions are marked with capital letters : 
propositions with arabic numbers. 



i/ 2l 

f. 

■ • • d m i 

. (A) 

f. 

dii 

fi 

••■dmi 


d\ in 

L , 

Zlll 

d sn i 

• • • d mm 


cofactor of r/ 

s in 

1) . 

• (B) 


n> = i, 2. 

\q = 1, 2, 


’ '2 ) <], L l + ,] 1" + " • + B l‘ m </( i M 

\tq=p 

l 0 if q ^ /> 

//=! V! • • 

Di>' = .... 

cofactor of l) ps in 1/ = L m ~ 2 </ ])s . 


If 


11^1 + ^21 


& + </■. 


A X 2 + d 3 l A. } + ... + (/ ial X m — dj | 

2 X 2 + <L 2 \ 3+ ... + d m A ni = d 2 I 


c hm\ + 1 1 


then 


(1) 

(C) 

(2) 

(3) 


\ (4) 


(p = 1, 2 ,...««) 

v l J pi ^ l + Dp 2 1 2 + lJp:\ * :s + ••• + ]) pui 1 m 

A p ~ ir~ 


(iv) As a preliminary, to make the statements more 
concise, we may — though this is not essential — introduce 


41 


V. 2 Main properties of determinant 


the ordinary notation of summation. Thus (1) can he 
written 



1 , 2 .... w ; \ 

1 , 2 ^ 

’ ’ S = 1 




<Dif 1 =pi 

loifiqfpy 


and (4) can he written 


( 1 ) 


lit 

If (* = 1. 2 ,... w) 2 dps X p = I's- 

P = 1 

,M jn y 

then (/> = 1, 2,...m)X p = 2 

.5 = 1 

(v) The tensor notation involves five steps, which are 
set out in §§ 2-4, 6, 8 helow. The reader will find it 
helpful to copy out the statement in (iii), modified as in 
(iv), and make the successive alterations which are now to 
he described. 



V. 2 . Reciprocal determinant. — The first step (which 
will he found in modern text-hooks) relates to (1) and (2). 
The determinant If has sometimes been called the reci- 
procal of D. But, as has already been pointed out in 
IV. 6 (i), it is not a true reciprocal, since the product of 
the two determinants is not 1 but I) m . Since, however, 
1)' contains m columns, we see that if we form a new 
determinant by dividing each element of it by I) the pro- 
duct of this new determinant and 1) will be 1. We there- 
fore write (1) in the form 


( P 1) 2, ... in ;\ ^ Ppg . j 1 if q — p ) 

1, 2,...S» ) to if qfpV 

and form a new determinant If defined by 


(1) 


I)" = 

This new determinant will 


B, ir /D | (C) 

be equal to I)'/D m — 1 /I), so 


42 


Reciprocal determinant V. 2 

that the product of the two determinants will he 1 ; and 
the cofaetor of L pa /L in the new determinant will he 
L m ~ 2 d ps /B m ~ x = d v JD = d pi I)". Hence 1) and It" 
are so related that their product is 1 and that the cofactor 
of any element of either determinant, divided by the 
determinant, is equal to the corresponding' element of the 
other determinant. We can therefore call each determi- 
nant the reciprocal of the other. 

V. 3. Elements of reciprocal determinant.— The 

next step is to have a single symbol for 

(cofactor of d in It) -p 1). 

We have already used D ps for the cofactor of d ps ; and it is 
inconvenient to introduce a new letter in place of (l or I). 
We therefore denote the above expression by 

dP*. 

We accordingly, in our statement, replace (B), (1), (C), (2), 


(3), and the second line of (4), by 

dP s = (cofactor of d in D) ]J . . .(B) 

in . _ 

2 ,...m ) s _ x qs {Oiiq^pS ' ' ' 

JT=\dV\ (C) 

LB" =1 (2) 

d ps = (cofactor of dP* in L")~ D" . . (3) 

rn 


(p = 1 , 2 ,.,.m)X p = 2 dP’Y t . [2nd line of] (4) 

5 = 1 

The parallelism of (B) and (3), and of the two lines of (4), 
shoidd he noted. 

A . 4. Set-notation. — (i) The next step is to abbreviate 
(4), as altered. This contains two statements — a hypothesis 


V. 4 (iii) 


Set-notation 


43 


and a conclusion ; they are similar in form, so that we 
need only consider the first one, namely 

m 

(,= 1 , ^d ps X p = Y s . 

V = 1 

(ii) The expression on the left-hand side of this state- 
ment has a definite value for each value of s j we can 
denote these values by 

h Y K,, E z , . . E m , 

The statement then takes the form 

(* = 1 , 2 ,...m)E s = Y s . 

We cannot merely omit the ‘(is— 1, 2, ... w) ’ from this, 
without leaving it doubtful whether we are speaking of 
some particular * or of each ,s. We get over the difficulty 
by omitting the ‘ (*= 1, 2, ...?«) ’ and replacing the s 
in ‘E s = 1 s’ a Greek letter. The convention then is 
that a statement of the form 

K = K 

means that h s is equal to I s for each of the values of .s, 
i.e. that 

~ ^ i ’ As = ^21 A — = Y m ; 

it being understood that the values 1, 2, 3 ,...m which 
are to be given to a Greek letter have been settled before- 
hand and remain the same throughout our work. 

(’ii) Applying this convention to the two statements 
in (4), it becomes : 

HI 

If 2 < l pA'p = y a , ’hen .V A = 2 >l K<y s • . . (4) 

p 1 * = l 

It is immaterial what Greek letter we use in either of 
these statements, provided the letter is the same on both 
sides. We could have used the same letter in the two 


Set-notation 


44 


V. 4 (iii) 


statements, but in the particular case it is better to have 
the letters different.* 

We have rather spoilt the symmetry of (4), but we will 
put this right in § 6. 


(iv) The statement in (1) is a statement as to the value 
of a certain expression for all values of p and all values 
of q ; and, so far as the left-hand side is concerned, we 
could extend the above principle by using two Greek 
letters. But the right-hand side presents difficulties ; and 
we must therefore leave this over for later consideration (§ 8). 


V. 5. Principles of set-notation. — It is desirable at this stage 
to consider the principles underlying the notation which we have 
just adopted. 

(i) Take first the case of a single set of in quantities or elements ; 
i. e. an aggregate of in quantities which fall into a certain linear 
arrangement. We denote these by, say, 

A x A 2 A 3 ... A nl . 

We have settled that a statement such as 
- 4 a = E k 

is a comprehensive way of saying that 

-4 1 = E lf A 2 — R 2 , -d 3 = R 3 , ...A )u = E jn , 

Thus we use A p etc. when we are referring to a particular member 
of the set, and we use A ^ etc. when we are making a statement 
with regard to each member of the set in turn. We may also 
want to speak of the set as a whole. It will he found that no 
confusion arises from using ^ in this sense also. We can there- 
fore say that 

-4\ — (-4 1 A 2 A 3 ...A m ) i 

the brackets being used in order to show that we are considering 
the set as a whole. In this sense we might regard the statement 
A K = E k as meaning that the set A K as a whole is equal to the set 

* I have as far as possible used X, p, p, <r in this chapter to corre- 
spond to }), q, r, s, reserving v for product-sums (§ 6). Later on it 
is better to have no fixed rule, beyond that laid down in S 5 (iii). 


V. 5 (iv) 


45 


Principles of set-notation 

E K as a whole, this equality of the wholes implying equality of the 
parts. We shall have to take this step later (Chapter VI) : for the 
present it will be sufficient to regard the statement A K = E\ as 
merely an abbreviated way of saying that A, = E v A 2 = E 2 , etc. 

(ii) The interpretation of as the set as a whole recalls the 
idea of a vector as the resultant of a number of components. But 
the operations which we shall have to perform with single sets do 
not follow exactly the same laws as those which govern operations 
with vectors as ordinarily understood (see note to (vi) below), so that 
the analogy must not be pushed too far. 

( iii) Next take the case of a double set of m? quantities, i. e. a set 
consisting of m single sets, each containing m elements. If we 
denote the elements of the 3th single set by F ql , F,, v ... F ?m , 
this single set as a whole can be called F q p, and the complete 
double set can be called Fp„. We think of the elements as 
arranged in a square, the columns of which are the single sets : by 
regrouping (e. g. F u ., F n ,...F mr ) we get the rows of the square. 
We have already, in § 1 , adopted the convention that in d qr or d rq 
the q represents the column and the r the row ; and similarly we 
shall say that in or F p)2 etc. the first letter, according to alpha- 
betical order, means the column and the second the row. Hence 


'Vf> — 

( F F F 

y ll 1 21 1 31 • 

F F F 

1 \2 1 22 -^32 ■ 

f - 1 

• x mi 

’ — 

'1'n F a . 

2 \ Fll ^ 23 • 

■F lm \ 

■ Fim 


.^1M ^ Wl 

F 

• L mm/ 


Fmz’ 

• F rtu n) 


the brackets being inserted, as before, in order to show that the 
set of quantities is in each case to be regarded as a whole. Then 
the statements 

Fpp — f pp~ Hppi 

mean respectively that F qr = G qr , and that F qr = I ! rr[ , for every 
value of q taken with every value of r. 

A double set is symmetrical if it is not altered by interchanging 
columns and rows. 

(iv) A particular form of double set is obtained by multiplying 
together every element of one single set and every element of 
another single set (of the same number of elements). If these two 
sets are B ^ and C p (in this order), the representative element of 


46 


Principles of set-notation V. 5 (iv) 

the resulting double set will be B q C T , so that the double 6et can 
be represented by B u C p or C p B p . This double set is called the 
product of the two single sets. It should be noted that we must 
not write it as ‘ B p C p or as ‘ B p C p ' ; partly because this would 
not define the particular arrangement of the elements of the 
double set, and partly because we shall presently have to give 
a special meaning to these latter expressions. 

(v) In addition to double sets and single sets, we have to use 
single quantities, such as a or k. Any such quantity is called 
a scalar. It need not be a constant : it may, as will be seen later, 
be a definite function of the elements of one or more sets. 

(vi) We shall for the present be dealing only with expressions 
which, interpreted according to the laws of ordinary algebra, are 
obtained fiom scalars, single sets, and double sets by addition, 
subtraction, and multiplication. The rule of interpretation is the 
same that we adopted in (iv) for B p C p : we replace the Greek 
letters X p p ... by p q r ... and take the total expression to be 
the set obtained by giving to each of the quantities p j r ... each 
of the values 1 2 3 ... m. For example : 

(«) kA p means the single set whose elements are 
lcA i ,...kA m \ 

(b) A K ±bB K means the single set whose elements are * 
A 1 ±bB 1 , A l ±bB„ A m ±bB m ; 

(«) -pp~ a Bp G p means the double set whose element in 
the gth column and the rth row is A qr -aF r G q . 

It is obvious that this system of interpretation is in accordance 
with the laws of ordinary algebra ; for instance 

k (A p + B p ) = kA p + kB p = kB p + kA p , 

F k ( G p - H p ) = F k G p - F a If , 

and so on. 

We are further restricted, in the case of expressions containing 

* It will be seen from (vi) (a) and (b) and from (iv) that single 
sets follow the same rule as ordinary vectors as regards multipli- 
cation by a scalar, addition, and subtraction, but not as regards 
multiplication together. 


v. 6 (ii) 


47 


Principles of set-notation 

more than one term, to (1) scalar expressions, (2) single sets 
arising as sums (‘ sum ’ of course including ‘ difference ’) of expres- 
sions which contain the same letter, e.g. aA K + bB\ + ..., (3) 
double sets arising as sums of expressions which contain the same 
pair of letters, e. g. aA pv + bB v C p . We do not therefore have to 
consider such an expression as A a + B a , which is really a double 
set, not a single set. 

(vii) The suffixes which we have so far attached to a symbol have 
usually, in accordance with the regular practice in algebra and 
with the ordinary meaning of the word, been placed below the 
line : the exception being the use of dP s to mean (cofactor of d /:S 
in D) 4- D. This latter system of having upper suffixes as well as 
lower suffixes will sometimes be found convenient. We may, for 
instance, want to denote a single set by A x ; and in that case 
A 1 , A 2 , ... would be members of the set. Where there is any risk 
of confusion, we shall not use the ordinary indices of algebra at 
all ; thus the square of A p will be A p A p , not A p . 

V. 6. Product-sum notation. — (i) Our next simplifica- 
tion consists in dropping the sign of summation in (1) 
and (4). But, since merely to drop it and to replace, say, 

m 

2 by dr s d qs would be misleading, we use a 

* = i 

special notation. The number of alphabets at our disposal 
is limited: and it will be found not only that we can use 
Greek letters for this purpose without risk of error, but 
that there are actual advantages in doing so. 

(ii) The rule we adopt is that, when an expression of the 
form B {) C p has to be summed for the values 1,2,...?;* 
of p, we denote the result by replacing p by a Greek letter 
in both places ; and, conversely, the meaning of such an 
expression as B V C V is 

B v C\, = B x C\ + B,C 2 + ... + B m C 


(V. 6. A) 


48 


Product-sum notation 


V. 6 (iii) 


(iii) The pair of vs in B V C V could be replaced by a pair 
of any other identical Greek letters; e.g. 



(V. 6. 1) 


The v (or p) is for this reason called a dummy. We can 
think of the sum represented by B V C V as the result of link- 
ing the elements of B v with the corresponding elements 
of C v] we can therefore describe a Greek letter which 
occurs twice as a linked suffix, and one which occurs once 
only as a free suffix. 

(iv) The rule in (ii) applies if either or both of the expres- 
sions B p and C p has a free suffix as well as the p ; e.g. 
A v ] Kv means A 1 B ^ + A,, B^ + ... + A m B um , and A Kv B pv 
means A XI B pl + A x ., B p ,+ ... + A Xm B pm , which is a double 
set whose typical element is A fJV B rv . 

(v) We can also have successive summations expressed 
in the same way. Thus 



involves summations with regard to A, with regard to pi, 
and with regard to p. It is easy to show that these sum- 
mations can be made in any order: e.g. we can tak e 
and 1) together as if the A and p were free, and then 
bring in B x and E np . 

(vi) As an example of the brevity effected by this 
notation we may take the expression for the product of 
determinants. Even for so small a value of m as 3, the 
expression obtained in IV. 5 (ii) for the product of two 
determinants is formidable. We can condense it by intro- 
ducing 2’s, but the result is clumsy. In the new notation 
it will be found that the method of IV. 5 (ii) gives 



(V. 6. 2) 


49 


Y. 8 (ii) Product-sum notation 


The process can he repeated : e.g. 


I V 1 X I V I X V I X I (l qr I = ■ ( V - 6 - 3) 

(vii) The result of applying- the product-sum notation to 
the statement in § 1 (iii) is that (l) and (4) become (p and 
s in (4) being replaced by A and <r respectively) 


(P = 1 , 2 
\ q = 1, 

If « = 


then Y x 


fl if 

(0 if q ^ p) 
= <?"Y a . • 


( 1 ) 

( 4 ) 


V. 7. Inner products of sets (i) The quantity B„C V behaves 

in many respects like an algebraical product. We call it the inner 
product of By and C p , to distinguish it from an ordinary or outer 
product such as B x C p (§ 5 (iv)). The inner product of By and C v is 
the sum of the elements in the leading diagonal of the outer 
product of and C p . 

(ii) In the same way A v B pv is the inner product of A y and 
B p y, and A^ v By p is the inner product of A Kv and B vp . 

(iii) The process of forming an inner product, as above, may be 
called inner multiplication. 


Y. 8. Unit-set notation. — (i) We have finally to con- 
sider the form of (1), which is a statement that 

/p = 7 2 .--- w /U ,a j U if q=P\ 

\q = 1, ) I” (0 if i/ ^ p) 

So far as the left-hand side is concerned, this is a statement 
as to the values of the elements of a double set 


,rd 


fW- 


As regards the right-hand side, however, the statement falls 
into two ; first that (V ia <l, pa = 1, and next that d J>a — 0 if 
p and q are different. We want to replace these by a single 
statement. 


(ii) We do this by converting the statement into one as 


50 


Unit-set notation 


V. 8 (ii) 


to the equality of two double sets. For this purpose we 
construct a set whose typical element, in the ^th column 
and rth row, is 1 if q and r are the same and 0 if they are 
different ; a set, in other words, which has 1 for each 
element- of its leading- diagonal and 0 everywhere else. If 
we call this set* 


then our definition of is that 

| v = the function of p and q which 

is = 5 1 ^i = P\ 

(0 if qz£ p) 

We can therefore write (1) in the form 



or, in the set-notation, 

*%, = !£■ 


(V. 8. A) 


• ( 1 ) 


V. 9. Properties of the unit set.— (i) We have defined |£ as 
the set whose typical element is 

,„ = j 1 if q = P\ (V.9.A) 

I 1 ~ 1 0 if q p i ' 

Hence each element of the leading diagonal of | £ is 1, and the 
other elements are all 0; in other words 


|£e/1 o 0 0 ... Oy 
0 1 0 0...0 
0 0 1 0...0 

'‘6 6 0 o...i' 


(V. 9. B) 


* The usual symbol, adopted by Einstein, is 8^; J. E. Wright 
(‘ Invariants of quadratic differential forms ’) uses Neither of 
these seems sufficiently distinctive ; and & already has a consider- 
able number of other uses. I have therefore altered the symbol to 
(‘unit Xp’), as an experiment. 


V. 10 (i) Properties of the unit set 51 


This set (with any pair of letters) will be called the unit set. The 
following are its chief properties. 

(ii) The set is symmetrical, i. e. 


I A - 


(iii) The determinant of the set is 


(V. 9. 1) 


II 


7 


1 0 0...0 
0 1 0...0 
0 0 1...0 

6 6 o...i 


= l. 


(V. 9. 2) 


(iv) Also, if we multiply together this determinant and any other 
determinant, in either order, it will be found that we merely 
reproduce the latter, i. e. 

1 1? | x I v 1 = V I x I? |= v I- (V. 9 - 3 ) 

(v) The special importance of the set, or of any column or row 
of it, lies in its effect when combined with another set to form 
a product-sum. It will be found that, t having any one of the 
values 1, 2, 3, ... m, 

I r A ^ — I t A n = ' h- I n a m v ~ I -D: = -iti/, (V . 9. 4) 
\h A n = \ \ A n = A \, | M A nv = 1 \ A nv = -l\v (V. 9.5) 
[For example, take t = 3 in the first part of (V. 9. 4). Then 
\i A ^ = \\A l + \lA, + \lA,+\lA i+ ... 

— 0 . .1 j -f 0. A. 2 + 1 . A 3 + 0..f 4 + ... 

= -la-] 

Thus the effect of inner multiplication by | £ of a single or double 
set which contains p (or X) is to alter the latter to X (or g). 


V. 10. Determinant properties.— (i) Before we write 
down the final results, there is another small change which 
we shall find it convenient to make. In the statement 
( I Ka d no — l£> obtained in § 8 (ii), the linked suffixes are an 
upper cr and a lower <t , which cancel one another ; and the 


52 Determinant properties Y. 10 (i) 

free suffixes A. and p are in the same respective positions 
on the two sides. It is desirable that, whenever possible, 
these two conditions should exist. In each of the state- 
ments (l Kn X K = Y a and X K = d Ka Y a , given at the end of § 6, 
one of the conditions exists but the other does not. We 
make them both exist by replacing A a , throughout, by 

X A = (A 1 X 2 ....A' m ), 

as explained in § 5 (vii). 

(ii) Our statement, after carrying out the alterations 
indicated in §§ 2-4, 6 and 8, and in (i) above, becomes — 


Notation. L = <1 . (V. 10. A) 

dl Jf = (cofactor of <l ps in 1J) -f- 1). (V. 10. B) 

1>" = | (tv . (V. 10. C) 

Properties. = | A - (V. 10. 1) 

1)1)"= 1. (V. 10. 2) 

( lps = (cofactor of dr s in 1)") -f It " . (V. 10. 3) 

If Y a = d Xa X\ then A A = d Ka Y a . (V. 10. 4) 


To these we may add the formula (V. G. 2) for multiplica- 
tion of determinants, namely 

I v I x I V = I SkKt ■ ( v - 10 - 5 > 

Y. 1 1. Example of method. — To illustrate the methods 
that we are now able to use, let us verify (V. 10. 4) by 
means of (Y. 10. 1). It is given that 

r = d.x K . 

(7 ACT 

To iind the value of d Kn l a , it will not do to replace Y a by 
the above value as it stands, since we should then have 


V. 11 


53 


Example of method 

three A’s. We must first replace A. in the expression for 
1, T by some other suffix, say /a. We then have, by (V. 10. 1) 
and (V. 9. 5), 

= = \IX» = X\ 

which is what we wanted to prove. The reader will find 
it instructive, for comparison, to write out the proof in the 
ordinary notation. 

In the above proof we have proceeded from tl K a ( < l W) 
to ( (l x a tf a ) A M . It has already been pointed out, in § 6 (v), 
that summations in a case of this kind can be made in anv 
order. 



SETS 


VI. SETS OF QUANTITIES 


VI. 1. Introductory. — (i) It may have been noticed that 
in V. 1 1 , in deducing ( V. 1 0. 4) from (V. 1 0. 1 ) and (V. 9. 5), 
we made no direct use of determinant properties : the onlv 
indirect use being in the relations between elements and 
their cofactors, from which (V. 10. 1) was derived. But 
we can dispense even with this indirect use. In the equa- 
trnns (l Ka X x =l' a the values of d Ka are supposed to be 
known; and we can treat the statement in (V. 10. 1), 
namely 

d Ka d — O 

as a set of equations giving the values of d Ka in terms of 
those of d KrJ . If, for instance, m = 20, the set d Ka contains 
400 elements, and (V. 10. 1) is a condensed statement of 
the 400 equations (each with 20 terms on one side) which 
give the 400 values of d Xa . Thus for p = 2 we should 
have 

( h\ (pl + J 22 + d u d 23 + ... + d lm d 2 " 1 = 0 ' 

+ d n d 22 + ,/ 23 d 23 + . . . + d im d 2 ^ = l 
d 3l d 2 ' + d 32 d 22 + d 33 d is + ... + d 3m d 2llt = 0 

d vn d 21 + d m J 22 + d mS d 23 +...+ d mm d^ = o ) 

which give the values of d 21 , d 22 , d 23 ... d- m , i.e. of d 2 ' 7 . 
Similarly for d ia , d 4 °, etc. 

(ii) We have, in fact, arrived at a position similar to 
that reached at the end of the first chapter. We started 
with the problem of solving a set of simultaneous enuations, 
and arrived at a probable solution, involving what we 


58 Introductory VI. 1 (ii) 

called determinants. To verify the solution, we had to 
investigate the properties of determinants. The determi- 
nant thus took the leading place, its applicability to the 
solving of equations being one only of its properties. 
A determinant of order m is based on a set of m 2 quantities, 
which for convenience of reference are thought of as 
arranged in a square, the determinant being expressed by 
enclosing the set of symbols of the quantities between 
vertical lines : and we have reached the stage at which the 
set of quantities becomes the important thing, its existence 
as the basis of a determinant being one only of its 
properties. 

(iii) These properties we have now to consider. The 
following sections of this chapter are mainly a restatement, 
with obvious modifications and extensions, of results ob- 
tained in the preceding chapter. 

VI. 2. Single sets. — (i) We may have a single set 

A p = ( A l A 2 A i--- A m)- 

The separate quantities A 1 A i ...A m comprised in the set 
are called its elements. The order of the set is the number 
of elements comprised in it. The typical statement with 
regard to such a set is of the form 



This, in the first instance, we regarded merely as a short 
way of saying that 

A \ — ^l’ A 2 ~ I'2< • • • A m = I'm ’ 
but we must now think of it as a statement that, the two 
sets and E p , each taken as a whole, are equal, this 
equality of the wholes implying the equality of correspond- 
ing elements. The analogy of a vector may help us here. 


VI. 3 (ii) 


59 


Single sets 

The statement that two vectors are equal implies that the 
components are equal, each to each ; but what we really 
think of is not the separate equalities of the components 
but the equality, in all respects, of the vectors. 

(ii) Single sets behave like ordinary vectors as regards 
addition and subtraction and multiplication by a scalar 
(§ 4 (iii)), but not as regards multiplication of one single 
set by another. 


VI. 3. Double sets. — (i) We may have a double set of 
order m — i.e. comprising m 2 elements — 


x kp —( ^ i 
A„ 


-k 


Am ' 


j 

' im yl 2m - l 3m 


. . , A 

■'km - k 

Here the quantity in the yth column and rth row is A^ r . 
We adopt the convention that the first Greek letter (in 
alphabetical order) represents the column and the second 
the row, so that 


A pp — 

(Ax 

■^12 

A, -Am 


Ai 

A ? 

Ax •••Am 



A m2 

A nuA 

representative element of 

which is A rq . 


and A ppt are ca 


. rr The sets A pp 

ailed the transposed of each other. 


(ii) The dete rminant 

Krl 

is called the determinant of the set* A pp , and similarly 
| A r q | is the determinant of A . 

* The set is usually called the matrix of the determinant. It is 
a singularly inappropriate name, as the symbol of the set is the 
inner part of that of the determinant, not something which sur- 
lounds it. The set is really the substance or core of the determinant. 


60 


Double sets 


VI. 3 (iii) 


(iii) The brackets in which the elements of A , A^, A 
have been placed are not essential,* and have been intro- 
duced partly to help the eye and partly to indicate that the 
sets are being considered as a whole. 

(iv) A double set is symmetrical if columns and rows can 
be interchanged without altering it. Thus, if A^ p is sym- 
metrical, then A^^A : and conversely. 


VI. 4. Sets generally. — (i) We describe a single set 
as being of rankf 1, and a double set as being of rank 2. 

(ii) Similarly a set of rank 3 of order m is made up of m 

double sets of order m ; and so on. Thus we might repre- 
sent a set of rank 3 by A • There would have to be 

a convention as to the order of the symbols, so as to dis- 
tinguish A from A etc. Where, however, the set is 

symmetrical, so that A = A =e tc., this difficulty does 
not arise. 

(iii) The set of rank 0 is a single quantity or scalar. 

(iv) To denote a set generally, without reference to its 
suffixes, we use a Gothic letter such as 21 or 15. 

* An alternative method, in the case of a double set or matrix, 
is to enclose the symbols between two pairs of vertical lines, so as 
to distinguish the set from the determinant, which has two single 
lines. It is not a satisfactory symbolism from our point of view, 
as it would seem to suggest that the set is more restricted than the 
determinant, whereas what we are aiming at is to free the set from 
the bonds of the determinant. 

t I have been doubtful as to the appropriate word. In reference 
to tensors Einstein uses Kang, Hilbert Ordnung, Eddington rank, 
de Sitter order. The objection to either of the last two is that 
there is already a settled meaning for order as regards a determinant, 
and (though this is not so important) for rank as regards a matrix 
(see note to § 3 (ii)). It would seem reasonable to describe a set 
containing mf elements (i. e. composed of m sets each containing 
elements) as being of degree f. I have, however, felt bound to 
keep to Eddington’s use of rank. 


61 


VI. 6 (i) Sums and products of sets 


VI. 5. Sums and products of sets.— (!) If two or 
more sets, of the same rank and the same order, have the 
name suffixes, we add (or subtract) them by addin" (or sub- 
tracting) corresponding elements. Thus 

A + A = (A + A A + A A + A • • ■ A* + AJ ; 

and similarly for sets of higher rank. 

(ii) At e multiply a set, of whatever rank, by a scalar 
when we multiply every element of the set by the scalar; 
6 . (r . 


*■ "A = 

fkA n 

kA n 




kA Vi 

k A oo 

kA 3 2 . 

• k A M 2 


■ H 

Am 



Thus the determinant of kA is 

not k times, but 

the determinant of A , . 





(iii) If 3. and }5 are two sets, with different suffixes, of 
ranks f and g respectively (,/and g not being necessarily 
different), their product 21 IB is the set of rank f +g ob- 
tained by multiplying every element of one by every 
element of the other. (Here, as elsewhere, we assume that 
all the sets we are considering are of the same order.) 


Thus the product of two single sets A and B n is the double 
set A B a obtained by giving to p and a separately each of 
the values 1 to «/. A product obtained in this way is 
sometimes called an outer product, to distinguish it from 
an 1 inner ’ product as defined in § 6 below. 


\ I. G. Inner product. — (i) "When a suffix occurs twice 
in an expression such as A vv or B v C v , or, more generally, in 
any single expression or product, e.g. -Arp... or Ap.AA... 
(where the letters may be in any order), this means that the 
expression is to be summed for the values 1, 2 .,./// of the 
suffix ; e.g. 


62 


VI. 6 (i) 


Inner product 

I p C v = 11 x C\ + B. l C. l + . . . + B m C m 
B v I> kv = B i B xv + B 2 B^+... + B m I) >m [- . (VI. 6. A) 
etc. ) 

Where a letter occurs twice in this way, each of the two 
letters is linked. V here a letter occurs once only, it is 
free. The linked suffixes are called dummy, as they can be 
replaced by a pair of any other suffixes not already occurring 
in the expression. 

(n) In the particular case where the expression to be 
summed is of the form B V C ,,, the result is called the inner 
product of B v and C v , or of B k and C p , etc. It is immaterial 
what suffixes we use in this latter description, since they 
have to be replaced by one and the same suffix. 

(iii) From a pair of double sets A pp and B , or A Kc! and 
B up > ' ve can by a single product-summation form several 
different double sets A py B vp , A kp B pp , A kfl B kp) etc. There 
is also the scalar quantity A B pp , formed by two summa- 
tions, which can he simultaneous or successive ; if suc- 
cessive, the first is a product-summation giving us the 
double set A kp B pp or A pp B> pa . Strictly speaking, this 
scalar quantity A ^ B p? is the inner product of A and 
B W But it occurs less frequently than the double sets 
obtained by a single summation, and it is therefore more 
convenient to call one of these latter the inner product. 
\Y e shall call A pp B ^ the complete inner product of A and 
B np- By analogy with the expression found in V. 6 (vi) 
for the product of two determinants, we define the inner 
product* of two sets A pp and B plp — or, more generally, of 

This is what, in the case of matrices, is called the ‘ product ’. 
The true product of two sets A and B kp is a set of rant 4. 

If this use of ‘ inner product ’ seemed likely to lead to confusion 
with the ‘complete inner product’, we could use a different 
phrase, such as ‘ interproduct ’. It should be noted that the inter- 


VI. 6 (v) 


63 


Inner product 

two sets A^ a and B Xp — , in each of which the letters are in 
their proper alphabetical order, as the set A pv B vp ; the 
linked suffix in this latter expression being 1 chosen so as to 
be (alphabetically) intermediate between the two free suf- 
fixes. This is equivalent to saying, as regards this case, 
that the inner product of two double sets is the double set 
whose element in the q th column and xth row is the inner 
product of the i{th column of the first set and the xtli row of 
the second set ; and we apply this rule to all cases. The 
simplest way of applying it is to use the result for A pp and 
B pp and alter the order of the suffixes where necessary. 
Suppose, for instance, that we want the inner product of 
A pp and B ppi ; then by writing B=F we see that the 
inner product is A py F vp = A pv B pv . It should be noticed 
that in all cases the inner product depends on the relative 
position of the original sets ; thus the inner product of 
B n? and is not A py B vp but B pv A vp . 

(iv) There are four main forms of inner product constructed in 
accordance with (iii) ; and four others, which are really repetitions, 
can be obtained by interchanging the two sets. Denoting the 
inner product of 31 and 13 (in this order) by 3 x 13 (cf. IV. 5 (ii) as 
to product of two determinants), the forms are as follows : 



— A Ii 

xx j.lV- L 'vp 

(i) 

B HP * -^/Ap 

Bnv^-vp — 

AypBnv 

(5) 

-Apip X B P/A 

= Ani/Bpy 

(2) 

B PH * ^fAp 

II 

to 

11 

AypB^H 

(6) 

■Apfx X B pip 

1! 

to 

■o 

(3) 

B Hp * App- 

— B HV-^-pv ~ 

Api/B hv 

(?) 

Apn x B pn 

— 4 Ti 

-^VH^pv 

(4) 

B pH x A pH 

II 

to 

~c 

II 

A py B v h 

(8) 

(v) The transposed of an 

inner product such as A pv 

B,, p is found 


in the usual way (V. 5 (iii)) by interchanging the free suffixes p. 
and p. By comparison of (1) with (8), (2) with (7), etc., it will be 
seen that the transposed of the inner product of two double sets is the 


mediate product-sum, for A pp B pp , is not A pl ,B vp but either 
A^ p B pp or A pp B pa . There are various reasons for taking the 
former, rather than one of the two latter, as 1 the’ inner product. 


64 


Inner product VI. 6 (v) 

inner ‘product of the transposed sets , in reverse order ; e. g. the trans- 
posed of the inner product of A p ^ and B pfX is the inner product of 
-Bpp and A pfl . 

(vi) It will be seen by comparison with IV. 5 (ii) and the 
Appendix that the rule for construction of the inner product of 
two double sets is exactly the same as that for construction of the 
product of two determinants ; so that the determinant of the inner 
product of two double sets — whether we call them (say) A pp and 
B pi j or A pr7 and B p)< — is equal to the product of the determinants 
of the two sets. 

VI. 7. The unit set.— (i) The unit set* 

'p 

is defined as the set whose typical term is 


1 r 

= { 

i if 
0 if 


V 

> 

(VI. 7. A) 

P — 

r l 

0 

0 

0 . 

.0' 

(VI. 7. 1) 

P ~ 

0 

1 

0 

0. 

.0 



0 

0 

1 

0 . 

.0 



,6 

6 

6 

6 . 

. 1, 



(ii) From the definition it follows that the set is sym- 
metrical, i.e. 



and that 

II? =1- (VI. 7. 3) 

(iii) The special property of this set is that, if A is any 
set (possibly containing other suffixes cr t...), then 

I £4. = I (VI. 7. 4) 

so that the inner product of the unit set and any other set 
* This is by analogy with the ‘ unit matrix ’. 


VI. 8 (ii) 


Tlie unit set 


65 


is the same as the latter but with the suffix changed. In 
other words, the unit set acts, for inner multiplication, as 
a substitution-operator . 

(i y ) In particular, the inner product of two unit sets is 
a unit set, i.e. 

\:\;=\r (vi. 7. 5) 

VI. 8. Inverse double sets.— (i) We take any double 

set 

and we say that there is another set 

a pu 

connected with it by the condition that the inner product 
of the former and the latter is a unit set, i.e. (see S5 6 (iii) 
and 7 (i)) that 

^,- r =l£- (VI. 8.1) 

I his represents m 2 equations, which are sufficient to deter- 
mine the m 2 values of A rfl when those of J qr are known. 
I he set A p as defined by the above condition, is called the 
inverse of the set A^ . We shall keep to this notation, so 
that (VI. 8. 1) will always hold, however we alter the 
letters A, p, v, p. 

(ii) The above is subject to one condition. If we write 
down the equations which determine the elements in, say, 
the second row of A namelv 

A n A 2l + A 12 A 22 + A 13 A 23 + ... + A lm A 2 "' = 0 ) 
A 2l A 2l +A 22 A 22 + A 23 A 23 +... + A 2m A 2 ‘“ = 1 
^ n + ^ 22 + A 33 A 23 + ...+A 3m A 2 >“ = 0 - 

A m A 2 ' + A m2 A 22 + A m3 A 22 + ...+A mm A 2 ’" = 0 


66 


Inverse double sets 


VI. 8 (ii) 


we see that in order that there may be a solution it is 
necessary that we should have | A rq | ^ 0, which is the 
same thing- as 

\A qr \^0. (VI. 8. 2) 

This applies also to the other rows. It is a sufficient as 
well as a necessary condition for the existence of A pp . 

(iii) Taking it that | A qr \ 0, we have, by (V. 6. 2) and 

(VI. 8. 1) and (VI. 7. 3), 

| AV \x\A rq \ = \ A* v A rv I = || % ! = 1. (VI. 8. 3) 
It follows that 

| A'l r |^0, | A r 1 | =£ 0. (VI. 8. 4) 

(iv) The statement (VI. 8. 1) is a statement as to the 
m 2 relations obtained by taking each value of p with each 
value of p. It is therefore equally true to say, by inter- 
changing p and p, that 

^A pv = \;. (VI. 8. 5) 

The expression on the left-hand side is (§ 6 (iii)) the inner 
product of A pp and A ; and these are the transposed of 
A ppi and A respectively. Hence, if 15 is the inverse 
of SI, the transposed of SI is the inverse of the trans- 
posed of IB. 

(v) The relation in (VI. 8. 1) is a relation connecting 
columns of the original set and rows of the inverse set. 
There is a similar relation connecting rows and columns. 
For (VI. 8. 1) gives 

A pX A pv A p " = | £ A pX = A pX = | x A pv -, 
whence, as will be shown in § 9 (v), it follows that 

d pv = \ X V , (VI. 8. 6) 


and hence also 


(VI. 8. 7) 


Inverse double sets 


67 


VI. 9 (iv) 

The expression on the left-hand side of (VI. 8. 6) is the 
inner product of A vk and A Kv , which is the equivalent 
of that of A ph and A pp ; so that, by the definition in (i), 
the latter is the inverse of the former. Hence, if 25 is 
the inverse of SI, then SI is the inverse of 25 . The 
relation (VI. 8. 7) is similar to (VI. 8. 5), and shows that 
the transposed of 25 is the inverse of the transposed of Si. 


VI. 9. Reciprocation. — (i) Suppose there are two single 
sets A' A and 1 K connected by the relation 

y„ = A„x\ 

Then, as in A . 11, we have 

Y„ = A ka A prT X p =\ k X p = X k - 

i. e. 

If Y a = Aa x \ then X k = A Ka Y a . (VI. 0. l) 
Similarly by taking X k to be each single set, in turn, 
of a set of second or higher rank, with Y x to correspond, 
we find that 

If Y nT ..= A^ a X k '" } then X k mt =A Ka Y aT ^ t . (VI. 9. 2) 

(ii) Thus the operation represented by A K(T is annulled 
by the operation A ka ; and conversely. The sets A Xa and 
A ka will be said to be reciprocal to one another : and the 
process adopted in (VI. 9. 1) and (VI. 9. 2)— which we shall 
have to use very frequently — will be called reciprocation. 

(iii) We see from § 8 that the reciprocal of a set is 
the transposed of the inverse of the set, and conversely. 
If a set is symmetrical, its inverse and its reciprocal are 
identical. 


(iv) As an example of the application of (VI. 9. 1), 
suppose that 


A a Y k = 0 


68 


VI. 9 (iv) 


Reciprocation 

for all values of a. Then, by reciprocation, 

A' A = A Krj 0-0, 
provided that | A rjr | ^ 0. (This is practically the same 
thing as saying that, if the inner products of A a by in in- 
dependent single sets are all 0, then A* is 0.) 


(v) Similarly, suppose that 


a a „c- = J A „zr 


identically, i. e. for all values of A and a, 
| A qr | is not = 0. Then, by reciprocation, 


C° = A Xv A Xn I) pa = I v n IT = I) v 

Ap I p 


and that 


Thus we can divide both sides of the equation by A Xv . 
This supplies the missing step in § 8 (v). 


(vi) The two definitions, and the proposition, used in 
the establishment of (VI. 9. 1) are 


7 _ j 1 if r = q\ 

r ( 0 if r zp q\ 
' X A P = A K , 

i\g | A # 

I M 5 


* b 

A A* 


(A) 

( 1 ) 

IB) 


and from these we deduce (VI. 9. 1). We could have 
altered the order in various ways. For instance, we 
could have defined | A by ( 1 ); thence, by giving A its 
successive values, and equating coefficients, we should have 
got (A). Also we might have defined A Xa by (VI. 9. l), 
instead of by (B), as the coefficients of Y a when the equa- 
tions Y a = A Xa A A are solved for A A . This would give 
A rA = A ka A ^ X h . Then, if we defined | A by (1), we should 
have A ka A^ = | A ; or, if we defined | A by (B), we should 
have A A = | A A 71 , which is (1). 


VI. 10. Continued inner products. — -(i) We can construct con- 
tinued inner products without ambiguity, provided we adhere to 
the rule laid down in § 6 (iii). Suppose, for instance, that we 


YI. 10 (v) Continued inner products 69 

want the inner product of A x „, B Xa , and C aX . That of A Xa and 
B Xa , according to the rule, is A XpL Bp a . If we call this F Xa , then, 
by (2) of § 6 (iv), the inner product of F Xa and C„ x is F Xv C av ; i.e. 
the inner product of A Xa , B XlJ , and is A Xp B pv C m . Similarly 
that of A Xa , B Xa , C aX , and D X(J is A Xp B p V Cp V I> pa . 

(ii) On the other hand, the inner product of A„ x , B„ x , C Xa , D aX 
is not A ap B pv C^D pX . For, by (4) of § 6 (iv), that of A aX and B aX 
is B apL A pX . Calling this G aX , the inner product of G aX and C XlJ is, 
by (3) of § 6 (iv), C v „G vX = C va B v ^Ap X . Similarly that of A aX , 

Bfy X is D/Jp Cyp BypA pX . 

(iii) The transposed of the inner product of any number of double 
sets is the inner product of the transposed sets, in reverse order; 
e.g. the transposed of the inner product of A x „, B Xa , C aX , D Xa is 
the inner product of D„ x , C Xa , B aX , A aX . [For we have shown, in 
§ 6 (v), that this is true for the inner product of two sets ; and 
thence it follows, by induction, for any number of sets.] 

(iv) The inverse of the inner product of any number of double 
sets is the inner product of the inverse sets, in reverse order ; e. g. 
the inverse of the inner product of B aa , C aa , D a<J , E a „ is the inner 
product of E aa , D aa , C na , B aa , i. e. 

If A aa =B aB C 0y D yS E Sa , then A aa = B nT ’ (C< E 3 ' 1 . (VI. 10.1) 

[Denote this latter expression (right-hand side) by F va , and alter 
0 y 8 in it to n v p. Then the inner product of A a „ and F° a is 
A aX F aX = B ae B a PC Sy Cl , ''D yS D^E sx E^ 

= B ae B^C By CP v B yS D^ 

= B a0 B”? C By CP v \ v y 
= B aB B°P |g 

— |a- 

Hence, by reciprocation, 

F aX = H aA | n a = A aX , 
so that F aa = A* 70 .] 

(v) It follows from (iv) and (iii), since A aa is the transposed of 
A aa , that the reciprocal of the inner product of any number of 
double sets is the inner product of the reciprocal sets ; e. g. 

If A aa = B a0 C By D yd E Sa , then A aa = B a & EG S E Sa . 

(VI. 10. 2) 


70 


Partial sets 


VI. 11 (i) 


VI. 11. Partial sets. — (i) When we are dealing with a set 

i ^2 ^3 

we sometimes want to consider the separate or mutual relations 
of groups of the .4’s. The simplest case is when the set divides 
into two groups, one group consisting of k elements, which we 
shall take to be the first k, and the other group consisting of the 
other m — k elements. If we use suffixes a^y ... in reference to 
the first group, and rpx'l' ■■■ i n reference to the second, reserving 
X /iv ... for the set as a whole, we may treat the two groups as 
partial single sets of orders k and m — k respectively, and write 

■d-a - — (-^l '^2* ,, - / l/c)» '1 — (■'tft-f-i Ak +2 ...A m ). 


(ii) In the same way a double set A pp may fall into four groups 
by division by two lines cutting off k columns and k rows 
respectively. We could denote these groups by 

i -d-ay -A (py ) 

<t> being regarded as coming before y. The groups A ay and A^ 
would be partial double sets of orders k and m — k respectively. The 
groups A a ^ and A^ y would each have different numbers of columns 
and of rows, and therefore would not be double sets; but this 
would usually not matter, as we should be specially concerned with 
A ay and A^. The important point to notice is that, if we take 
A ay , say, as a partial set and construct the inverse set A~l a or the 
reciprocal set A a 7, the set so constructed will not in general be 
the same as the set made up of the corresponding elements of the 
set inverse or reciprocal to A pp . The inverse set Af a , for instance, 
is given by 

A a 0 A-re=\l 

with summations made only from 1 to k instead of from 1 to m. 
To avoid mistake, we may write it ( A~t a ) k . Similarly the set 
inverse to the partial set A^ may be written [A't'^’] m _ k . 

(iii) If, however, all the elements in the portions A al p and A^y 
are 0. so that the set A pp practically consists only of the two 
partial sets A ay and A this is also the case for the complete 
reciprocal set A^P ■ all the elements in A and A$' 1 are 0, and 
the elements of A a l and A ^ are just the same whether they are 


VI. 11 (v) Partial sets 71 

regarded as obtained from the complete set A pf> or from the partial 
sets A ay and A^. 

(iv) If we had two or more sets, single or double, divided in the 
manner described above, we could take portions from different sets 
to form new sets. If, for instance, we had divided A x into A a and 
A,p, and B x into B a and B ^ (orders again k and m-k), we could 
construct a new set consisting of A a and B^. 

(v) It would, of course, be incorrect to describe this new set as 
being A a + B^, or the old set as being A a + A,p: for we cannot 
add together two sets of different order. We could, however, 
look at the matter in another way. Consider the two sets 

U, A t ...A k 0 0 ...0 ), 

(0 0 ...0 A k+ , A /c+2 ...A m ). 

The sum of these, if we regard each as having the suffix X, is A x ; 
and in this sense, if we denote the two sets by A a and A^, and 
regard a and cf> as connoting X, we could say that 

'bv = -d n + A<p. 

If we compare A x with a vector, we see that A a and A^ correspond 
to the projections of 4 on a ‘ plane’, i. e. a surface of the first 
degree, passing through the first k axes, and on a ‘ plane ’ through 
the last in — k axes, respectively. 

The extreme form, if we made further divisions, would be that 
in which A x was split up into m component single sets, each 
having in — 1 0’s in it. It would only be in this sense that we could 
describe the set as being the sum of its m components. 


VII. RELATED SETS OF VARIABLES 


VII. 1. Variable sets.— (i) In the earlier chapters we 
considered the manner in which determinants arose in 
solving a set of equations of the form 

( s = 1. 2,...m)d u X 1 +d 2s X 2 + ... + d mt X m = 1]; (1) 

and in the chapter preceding this we have considered the 
general aspects of the system under which we express these 
equations and their solution in the form 

r a = d Xa x\ x A = d Ka r a . ( 2) 

According to the definitions we gave to the notation, X A 
and Y a are each used in different senses in the two places 
where they occur in (2): X A means ‘the elements of X A ’ 
on its first occurrence and ‘each element of A rA ’ on its 
second occurrence ; and similarly for Y a , but in the re- 
verse order. We have, however, by this time practically 
reached the stage of treating a set as a whole, so that 
we can now regard (2) as a pair of statements, one of which 
gives an expression for the set Y a in terms of the set X A , 
while the other expresses X K in terms of Y a . The form 
of either expression determines the nature of the relation 
between the two sets. 

(ii) The special features of the particular case were that 
the A’s were unknown quantities which we wanted to find, 
that the d’s were coefficients, more or less accidental, and 
that the T’s were known quantities arising from the applica- 
tion of these coefficients to the X’s ; and, more important, 


Variable sets 


73 


VII. 1 (iv) 


that this was merely an isolated set of equations, for which 
we had no further use when we had found the A’s. 

(iii) The relations with which we have to deal in this 
and subsequent chapters are of a different nature. We 
have a set SI and a set 25, each consisting of a number 
(the same for both) of elements, which we will call the 
A’s and the It’s. In each set the elements need not be all 
of the same kind. Each of the A’s is a variable; i.e. 
it either has, or can (as in the theory of statistics or of 
error) be regarded as having, a very large number of 
actual or possible values. These variables, the values 
of which are algebraically independent,* together consti- 
tute the variable set Si. In the same way the B ’ s are 
variables, and constitute another variable set 25. But 
the two sets of variables are not independent of each 
other : they are connected by certain relations, by means 
of which the B’s are known if the A’s are known, and con- 
versely. Thus the B’s are functions of the A’s, in the 
ordinary sense of the word, and the A’s are functions of the 
B’s. In this case we say that 25 is a function of Si, and 
Si a function of 25- But we must not only say it, but 
think it ; i. e. we must treat the functional relations of 
the A’s and the B’s rather as interpreting the nature of 
the functionality of Si and 25 than as actually consti- 
tuting this functionality. 

(iv) In the particular case we have been considering, 
Si and 25 were the single sets A* and Y K , and the relation 
between them was linear ; i. e. the T’s were linear functions 
of the A’s, and the A’s were therefore linear functions of 

* By algebraical independence of m quantities we mean that 
each may have any of its values, whatever the values of the other 
m — 1 may be. This does not imply statistical independence, which 
is a different thing. 


74 


Variable sets 


VII. 1 (iv) 


the T’s. In such a case we say that Y A is a linear function 
of X A , and it follows that A A is a linear function of 1\. 

(v) In dealing with the theory of the subject, as distinct 
from its applications, we are concerned not with the actual 
values of elements of sets but with the relations between 
the sets. Thus in the case of the linear relation 
Y a = d Aa X K , where the variable single set Y a is expressed 
in terms of the variable single set A A and the fixed double 
set d Ka , the elements of d K(y form a kind of framework 

a = A + ^21 a + c hi A + ••• +d m , A j 

A = ^12 A + d 22 A +d 32 A + • • • + d m2 A [ 

A ^li#A "h d 2m A + d. im A + . . . + d mm A I 

into which the values of the A’s and the T’s can be fitted ; 

and what we are really investigating are the properties and 
mutual relations of such frameworks. In the present 
chapter we shall consider certain simple relations between 
two such frameworks, namely relations between the linear 
relation of one pair of sets and the linear relation of another 
pair of sets. 


VII. 2. Direct proportion of single sets.— (i) If a 
quantity Z is a linear function of m X J s, which we will call 
A' 1 , A 2 , . . . X“\ it is of the form 

Z = fi x X x + // 2 X 2 + ... + k m X m = b A X\ ( 1 ) 

This is the simplest form of statement of a linear relation. 
Suppose, for instance, that Z is the 3rd difference of the 
A’s, formed in the usual way, i. e. 

if = AAAX 1 . 

This is equivalent to Z— A 4 — 3 A 3 + 3 A' 2 — A 1 , so that 
/q = — 1, ^ 2 =:3, h 3 = — 3, /q = 1, fi 3 — h t = ... = h m — 0. 


75 


VII. 2 (iii) Direct proportion of single sets 

The As having these values, the A’s may alter, but (1) will 
always give the 3rd difference. 

(ii) Now suppose that there is another set A x , and that 
C is the same function of the A's that Z is of the A’s ; 
e. g., as in the above example, that it is the 3rd difference. 
Then the As are the same, so that 

C=h K A\ (2) 

We can write (1) in the form 
Z/X x = 

on the understanding that a suffix in a denominator is linked 
with a similar suffix on the other side and implies an inner 
multiplication. Similarly we can write (2) as 

C/A K = h K . 

Equating the two values of h x , we have 

C Z 

> - x x 

as our way of stating that C is the same linear function 
of the A’s that Z is of the A’s. 


(iii) Next suppose that a set T * is a linear function of 
the set X x , so that each of the T’s is a linear function of 
the A’s. Then we can take Z of (1) to be each of the 
1 s in turn : but the sets of As will be different, so that the 
relation will be of the form 


r , = w- 


If there is also a set JB , each element of which is the same 

* The set might be called either Y p or Y p ; we choose the latter 
as giving a convenient symbol d pp for the coefficient of X* (see V. 


76 


Direct proportion of single sets VII. 2 (iii) 

linear function of the A’s that the corresponding’ element 
of I p is of the A’s, then 

the two sets of d’s being the same. V e therefore have* 

^ _ r r 

A* ~ iV 

as a statement of the fact that the B ’ s have the same linear 
relations to the A’ s that the T’s have to the A'’s. This would 
be the case, for instance, if the I ’s were the successive 
differences of the A’s, and the B's were those of the A’s 
according to the same system. 

In view of the variety of ways in which we are able to 
deal with sets according to algebraical laws, it is perhaps 
permissible to describe this as a case of direct proportion, 
and to say that B p bears the same ratio to A^ that Y 
bears to A M . This ‘ratio’, here denoted by d , is reallv 
the operator that is required to convert A M into or A M 
into I p. 

(iv) If the linear relation of the B’s to the A's is the 
same as that of the I’’s to the A’s, then that of the A’s 
to the B’s is the same as that of the A’s to the T’s 
(or, if the ratio of B p to A M is the same as that of l’ p to 
A 7 *, then the ratio of A M to B p is the same as that of 
A" to Ip); i.e. 

B 0 I* A 1 * A" 

If J = then jj = y • (VII. 2. 1) 

^ p p 

[Let B p = d pp AY Then = d^ £ Similarly A" = ( fV 1 ' 
Therefore A* /B p = X^/Y p .\ 

* This expression B p /AP must not be confused with BJA) 1 as 
the double set whose typical element is B r /A<i. The limitation iu 
V. 5 (vi) excludes double sets of this kind from consideration. 


77 


VII. 3 (i) Direct proportion of single sets 


(v) The above sets of relations can be expressed by the 
diagram in Fig. 1. The crosses 
may be taken as representing 
either coefficients of the A’s 
in the values of the F’s and 
of the A’s in the values of the 
P's ; or — in virtue of (VII. 

2. 1) — coefficients of the F’s 
in the values of the As and 
of the B’s in the values of 
the A’s. 


A 3 

X 3 

X 

X 

x 

A 2 

X 2 

X 

X 

X 

A' 

X' 

X 

X 

X 



Y, 

y 2 

Y I 



B, 

b 2 

Bi 


Fig. 1. 


(vi) Ratios of the kind considered above can be com- 
bined according to the laws of ordinary algebra ; e.g. 

A* B v ~ B v A f > ( ■ ■ ) 

The expression on the left-hand side is, of course, an inner 
product. A special ratio is 

L ^ = \>\ * = (VII. 2. 3) 

Example. If B p /A M = r p /A M , prove that B p /A^. X^/Y a = \ p . 


VII. 3. Reciprocal proportion of single sets.— 
(i) The other important class of cases is that in which the 
linear relation (or ratio) of B p to A 1 * is the reciprocal of that 
of I p to A 7 *, i. e. in which, B p and A p being altered to 
B p and A , 

lhis gives 


so that 




B p 



We can call this a case of reciprocal proportion. 


78 Reciprocal proportion of single sets VII. 3 (ii) 

(ii) If the ratio of B p to is the reciprocal of that of 
Yp to X p , then the ratio of Y to X p is the reciprocal 
of that of B p to ; i. e. 


If then^ = i. 

Ap Y p B p X p 


[Let B p = k pp A p. Then A^ = k^W. 

r P = Therefore AJW = Y p /X p .] 


(VII. 3. 1) 
Similarly 



(iii) The sets of relations 
can be expressed by a diagram 
as in Fig. 2, where the crosses 
represent coefficients of the F’s 
in the A r ’s and of the A’s in 
the B’s, or of the A’s in 
the Y s and of the B’s in 
the A’s. 


(iv) The inner products of the reciprocally corresponding 
sets are equal ; i. e. 

B p X p 

If -J - y ’ then £P}, p = A p XIX - (VII. 3. 2) 

P" P 

[Let B p = k pp A p. Then X^i^Y ; and therefore 
Yp = kX p . Hence B p Y = k? p A k X v = k pp k ApX v 
= | Up XV = ApXY] 

(v) An interesting case is that in which the F’s are the successive 
differences of the X’s. It will be found that in this case the B's 
are linear functions of successive sums, and therefore of successive 
moments, of the A’s. In the ordinary system, for instance, which 
is such as to give X, = X 1 , X 2 = AX 1 = X 1 - X', X s = AiiP = 
X> - 2X> + X>, ... , we have X 1 = X„ X 2 = X, + X 2 , X s = 
Y J + 2Y t + Y„...; and these give J3 1 = - 2A„ E 1 = + 22A 2 , 
B 3 = — 222 A s ,..., the constants in the sums being chosen 
that 2A m+1 = o, 22A m+I = o, 222A W + I = o, . . . 


so 


79 


VII. 4 (iii) Cogredience and contragredience 

VII. 4. Cogredience and contragredience. — (i) In 
§§ 2 and 3 we have not assumed the existence of any relation 
between A x and X A or between B x and 1\. Where a relation 
does exist, the important cases are those of cogredience and con- 
tragredience. We start with a set X A , and a set A k which 
is derived in a definite way from X A , in other words is 
a function of X A . We then take Y x to be any linear func- 
tion of A , and B k to he derived from Y x in the same way 
that A k is derived from X A . Then B x is some function of 
A k . Of the cases in which this is a linear function, w r e are 
concerned with two special classes : — 

(1) If B p /A h is always = YJ. X*, then A k and X K are 
said to be cogredient. 

(2) If B p /A^ is always = XY /Y p , then A k and X A 
are said to be eontragredient. 

An example of cogredience is given in IX. 6 (x), and of 
contragredience in VIII. 3 (iv) and IX. 4 (v). 

(ii) Instead of saying that A k and X A are cogredient or 
eontragredient, we might say that A k in the one case varies 
directly as A A and in the other case varies reciprocally as X A . 
When we say that A k varies directly as X A , we mean that, 
if X A is multiplied by any double set involving A, A K is 
multiplied by the same set : when we say that A k varies 
reciprocally as X A , we mean that, if X A is multiplied by any 
double set involving A, A k is multiplied by the reciprocal of 
this set. / 


(iii) If A x and X A are \ c °S redlent l ;vn d P K and A 

v 7 / pnntrno’rpnipnf, \ 


are also 


| eontragredient ) ’ 

( cogredrent ) then pX and ^ eo gredient 
( eontragredient ) 


if A x and X A 


are 


( cogredient 1 
l eontragredient ) 


, but P A and A K are 


(eontragredient! t p en jj\ ;lD( j X A are eontragredient. 
I cogredient } 


80 Cogredience and contragredience VII. 4 (iii) 

[Suppose, for instance, that A K and A* are contragredient, 
and P x and A x are also contragredient. Then we have 
relations of the form 

B p /A^ = X^/Y p , Q p /P> = AyB p , 
whence, by (VII. 3. 1), 

Q p /P* = Y p /X>, 

so that P x and X K are eogredient.] 

VII. 5. Contragredience with linear relation. — 
(i) The simplest case of contragredience is that in which 
the contragredient sets are connected by a linear relation. 

(a) Suppose that, with the notation of § 3, the relation 
between A x and X k is 

4 = 

Then, if Q denotes the inner product (cross-product) in 
(VII. 3. 2), 

Q = A^XY = + (a n + a 2l )X'X- 

+ a„ 2 J 2 A’ 2 + (a 13 + a 31 ) A' 1 .V 3 + ... + a mlu X m X‘“. 

Thus Q is a quadratic in A 1 , A 2 , A 3 , . . . A™, i. e. in AV Also, 
since each of the four sets A 7 *, Y p , A p , B^ is a linear function 
of each of the others, Q can be expressed in a good many 
other ways, e. g. as a quadratic in PP, or in the form IP? A Y 
or (TA^ ~ M ' 

(5) If the relation between A p and A M is symmetrical, 
i. e. if = a pp , it can be shown that we shall, in addition 
to (VII. 3. 1), have the further relations 

A p _ BP Y p _ A' M 
Y P ~X»’ A~B>' 

(ii) Conversely, suppose that we are dealing with a set 


VII. 6 (ii) Contragredience with linear relation 81 


of quantities or coordinates X^E^X 1 X 2 ... X m ), and that 
we come across an expression 


Q = a^r, 

where a uv = a vpL . Then we may construct a new set 
given by 

X* = V^’ 

and we shall have 


Q = xm;. 

Now suppose we change the system of coordinates lin- 
early or replace the X’s by some linear functions of 
them. The a’s will normally have some definite meaning ; 
and this meaning, though not their actual values, will 
remain unchanged when the A r ’s are changed. Suppose 
that, when X ^ becomes T M , X — as based on this meaning 
of the a’s — becomes Y h . Then we shall have 


and also 


= xm;, 

yf x yu y ^ 


VII. 6. Ratios of sets generally.— (i) The word ratio 
has so far only been used in reference to single sets con- 
nected by a linear relation ; if the relation between Y p and X 1 
is of the form Y p = fl pp A 7 *, we call d pp the ratio of Y p to A 71 , 
and we call d w the reciprocal of this ratio. We car 
extend the use of the word to sets other than single sets. 

(ii) We have already had examples of a ratio which 
involves a scalar. Thus in § 2 we had relations Z=h K X K , 
C=// K A X , and we said that 



Here we can quite well call h K the ratio of C to A x or of Z to X* ; 


82 Ratios of sets generally VII. 6 (ii) 

i.e. it is the ratio of a scalar to a single set. Similarly in § 5 (ii) 
the statement 

Q 

X v X V Y* 

may be regarded as a statement that the ratio of Q to X v T M 
is equal to that of Y to X v . In each case the ratio of one 
set to another is the set by which the latter has to be multi- 
plied in order to obtain the former. 

(iii) The more important case is that of the ratio of 
a variable set of any rank to another variable set of the 
same rank. If SI is a variable set of any rank, and IB 
is a set which is a function of Si and is of the same rank, 
and if the relation between }3 and Si is of the form 

©=pa, 

where p is a constant set whose symbol contains all the 
suffixes occurring in Si and ©, then we can call p the ratio 
of © to Si, and denote this ratio by ©/J 3. 

(iv) In these cases we can continue to speak of the 
relation of © to Si as linear. Also, by solving the equa- 
tions, we find that there is a relation of the form 

Sl = p'©, 

so that, if © is a linear function of SI, then 51 is a linear 
function of ©. Here, p being the ratio of © to Si, p' is 
the ratio of Si to ©, and we can call each ratio the reci- 
procal of the other. 

(v) As an example, suppose that Si and © are of rank 3, 
and that p is the product of three double sets, each of which 
has inner multiplication with Si. Then the relation might 
be of the form 


VII. 7 Ratios of sets generally 

It is easy to show that in this case 


83 



Thus the reciprocal of the product of the three double sets 
is the product of their reciprocals. 

VII. 7. Related sets of higher rank. — With the 
preceding 1 explanation, there is no difficulty in extending 
the ideas of equality of ratios, and of related systems of 
sets, to sets of higher rank. 

Suppose, for instance, that we have a set 21 which is 
a function of three single sets, and a set 25 which is a 
function of three other single sets. If the three latter sets 
were functions of the three former, 25 would he a function 
of 21- The cases analogous to those considered in § 4 
would he the cases in which there were linear relations 
between corresponding sets. Suppose that X K , Y k , Z k are 
linear functions of U x , J' K , IV X respectively, e.g. 



that 2L is a certain function (in the most general sense) 
of U K , T x , W x , and that 25 is the same function of 
A' a , T a , Z k . Then 25 is some function of 21. If we sup- 
pose that, when the values of a Kp , c vT are made to 
vary, 25 is always a linear function of 21, and the ratio of 
25 to 21 is always compounded of the ratios of X K to U K , of 
Y k to V K , and of Z K to IV X , each taken directly or recipro- 
cally, we get an extension of the cases considered in § 4 . 
Thus we might have 



so that 


each of these latter expressions representing a set of rank 6. 


84 


VII. 7 


Belated sets of higher rank 

In this particular case we can say that A k f is cogredient 
as regards U k and F h and contragredient as regards W v ; or 
we can say that it varies directly as regards U K and W and 
reciprocally as regards W\ or that it is directly proportional 
to U k and I ^ and reciprocally proportional to W v . 


VIII. DIFFERENTIAL RELATIONS OF SETS 


VIII. 1. Derivative of a set.— We have now to con- 
sider the cases in which two sets vary together continuously, 
so that there can be a derivative (differential coefficient) of 
one with regard to the other, this latter being a single set. 
The derivative will in all cases be a partial one, since the 
elements of the single set vary independently. 


(i) The simplest case is that of a scalar linear function 


Here 


£ = h K X K = A 1 Jri + * s J* + ... + A M JP». 


*XP ~ P ' 


Giving p all values 


to m, we can write this 


*Z_ _ _Z^ 

_ ' A ” JS* ’ 


and we can regard h K as the derivative of the set Z (which 
is of rank 0) with regard to the set X x . 


(ii) Similarly, if 
then * 


} ; = 


;>t„ r 

t = d = -L 

iXP PP X 1 *' 


* It should, however, he noticed that in this statement the sign 
‘ = ’ has not the same meaning in the two places in which it is 
used. When we say that dY p /dXP = d pp , we mean that iY r /<)Xi = 
d qr for all values of q and r: hut, when we say that YJXP = d pp , 
we mean that Y r — d pr XP for all values of r. 


86 


Derivative of a set 


VIII. 1 (iii) 


(iii) A particular case of (ii) is where Y K = A\ Here 
3 X r /bX 9 is = 1 or 0 according' as r is = q or since the 
A’s vary independently. Hence (see (VII. 2. 3)) 

S.F A 7 ' A M 3 A 7 

F = IS = I" = = m- (VIIL L 

(iv) Taking also sets of higher rank, and not limiting 
ourselves to linear functions, we see that the derivative of 
a set with regard to a single set A a is a set of rank higher 
by 1 than that of the original set. 

VIII. 2. Derivative of sum or product.— (i) The 
derivatives of sums and products of sets follow the ordinary 
laws of derivatives of sums and products ; e. g. 

a(B + C) hB ac 

~ 7)A A + 7* A* 


7>A A 


■=— + (VIII. 2.1) 


i>(BC) 5C 

x =-=rC + ) 5 -^. viii. 2. 2 

c>A K > J A > J K v 


a a 


7) A 


2) 


(ii) As a particular case of this last result, take the scalar 
quadratic form considered in VII. 5. 

Q = % v ^r 

Here, taking (VIII. 1. 1) into account, we have 

% = V^ + ( ^ = v^ + ' \* XP \> 


a A 71 


HV 


3A M 




This can l ie verified by expressing Q in terms of the A’s and 
finding the partial derivative with regard to X' 1 in the 
usual wav. 


(iii) For an application of this, suppose that 

v^ r = V XMX " 

for all values of the A’s. By taking adjoining values of 


VIII. 3 (iii) Derivative of sum or product 87 


we get the result which is expressed by differentiation, 
namely 2a^X v = 2f v X\ 

Differentiating again, or equating coefficients, we find that 


VIII. 3 . Derivative of function of a set.— (i) If ; 
is a function of y, and y is a function of x, then we know 
that 

dz dy d r 

dy dx dx ’ 

and, total and partial differential coefficients being in this 
case identical, 

5,? i y c)s 

'by'bx (ix‘ 

(ii) Now suppose that B is a function of A x , and C p is 
a function of B . Then, p and r being values of A and p 
respectively, we know that 


icr _ iC^B, DC r ZB 2 ZC r 7>B m t>C'-*B^ 

DB 1 ZA p+ ^B 2 'bA p + ' ,-+ ZB m ZA p ~iB il fd p ' 

Hence, giving p and r all their values, 


_ 3(7 IBp 

ZA X ~ 

The argument applies to sets of higher rank 


we are dealing with C pa -‘\ where 
aspects independent of B , then 


the 


(VIII. 3. 1) 

• If, e.g., 
relates to 


; 1>C pa "• 7>C pa ---ZB f < 

i>A x DBp ' 


(iii) As a particular case (see (VIII. 1. 1)), 
ZC P ZB P <)C P 


(VIII. 3. 2) 


88 Derivative of function of a set VIII. 3 (iv) 

(iv) Suppose that the relation between B K and A K is 
linear, sav 

Then, replacing C or C p by C, we have 

sc scs/t* 


Si? 


_ sc; Kfl 
sj x sii M a a kP 


Hence 


sc / sc 

fB. 




Thus A k and SC/S A K are contragredient. We can express 
this by saying that A K and the operator S/S A k are contra- 
gredient. 

(v) The determinant of t)B^/<)A K is the Jacobian of B M 
e o *a t ’ r ^ " 


S(A. 


S B r 
fAA 


(VIII. 3. 4) 


S (A 1 , A 2 ,...A m ) 

From (VIII. 3. 1), taken with (V. 10. 5), we have the 
ordinary formula for the product of two Jacobians : 

(VIII. 3. 5) 


s B r 


S(? r 

SJ? M S C r 


S C r 

*Ai 

X 

fB q 

S A1 S B^ 


S A1 


VIII. 4. Transformation of quadratic form to sum 
of squares. — (i) For an example of a Jacobian, take the 
case in which a quadratic form is to be expressed as the sum 
of the squares of linear functions of the variables. Let the 
quadratic form be 


where 


Q = % P ^x p , 

(i) 

a — a 


Up pU 



(2) 


Let 


VIII. 4 (i) Transformation of quadratic form 89 

be a set of linear functions of the X’s, so that 

X« = b^Y a . (3) 

Suppose that the Vs are chosen so that (1) gives 

Q = Y 1 Y 1 + Y 2 Y.,+ ... + Y m Y m = Y a Y q . (4) 

Then, by substitution from (2), 

Q = b^.b pri Xf’ = bJ Jfirj X»XP. 

Hence, by comparison with (1) (see § 2 (iii)), 

%P = Kfpv (5) 

The Jacobian which we should usually want to find is 
that of with regard to Y a , i. e. 


B y ( 3 ), 

and therefore 
But (5) gives 


t _ 7>( X\X 2 ,...X m ) 

B 2> ••• fll) 

*Y a 

J = I W |. 
at up = b^W*, 


3J2 


a r 


( 6 ) 


and therefore 

| aV : = | W°b r ° \ = \bi r \x\b r <l\ = JJ. 
Hence, combining this with (6), 

J=\b*r\= { |fl« , -|}i= 1/ {| V |}i. 

Similarly the Jacobian of Y a with regard to X ** is 

r _ MfJ 2 ,...Y m ) 1 _ 


( 7 ) 




I Is 


bqr I (I a qr I ) 

= Wi\^ r \} k - ( 8 ) 

There are a good many different ways of expressing Q as 
in (4) ) but they all give the same two Jacobians. 


90 Transformation of quadratic form VIII. 4 (ii) 

(ii) Hence we easily obtain the value of the multiple integral 



vjo 

e- iQ dX'dX 2 ...dXm. 

— CO 


(9) 


i»8 ([)” 


Using 


to denote 


...(m times), we have 


D 


fir 


e~ iQ Jdr i dY 2 ...dY m 


= jJ°° e-W^dYt... 

= { | a l>r | }i(2jr)i™. 


e"* 1 - 1 * dY,„ 


( 10 ) 


(iii) We shall require, in the next chapter, the value of 


where 


N/D, 



\^ r e-' i<i dX 1 dX ,! ...dXm 

c A’ 


(ii) 


and D is as above. We find the value of N, as we have found that 
of D, by expressing everything in terms of V’s. By (2) and (4), 
and (VIII. 3. 1) and § 2 (ii), 


ZQ ZY T ZQ 
ZX r ZXrZY T 


2 b rT Y T ; 


and therefore, by (3), 


iyjM. =lfl a b Y Y 
,■ 0 °rr 1 a 1 t > 


N = b qa b rT ( Y a Y T e~^ dX' dX\..d X>n 
= Y a Y T e-'- Q dY l dY i ...dY m . (12) 

Let us write 

M. =^(|)’'‘ Y a Y, e~'^dY l dY 2 ...d Y m , (13) 


X - b rT M T . 


so that 


(14) 


VTIT. 4 (iii) Transformation oj quadratic form 91 


Then M t consists of m terms due to the i a values of <r. Since Q is 
the sum of the squares of the Y's, the only term which counts in 
the integration is that for which o- = t. Also we know that 


Y t Y t e- ir ‘ y ‘dY t 

J -00 

Hence it follows that 


e~ iYtl ’ l dY t . 


M t = Jbr ^ | y e~ h<i dY l dY 2 ...dY„, 


= bi‘I)-, 


and therefore, by (14), 


N/D = b JT bir 


IX. EXAMPLES FROM THE THEORY 
OF STATISTICS * 


IX. 1 . Preliminary.— (i) The special feature of a statis- 
tical set 

a a = cl a 2 a 3 ... A m ), 

of the kind which we have to consider in this chapter, is 
that each X has one only of a very large number of actual 
or possible values, which together constitute the field from 
which the X is drawn ; and the fundamental facts with 
which we are concerned are the relative frequencies of 
occurrence of the various possible combinations formed by 
taking an A from each of the m fields. Thus the A’s are 
variables, and the expression for the relative frequency of 
joint occurrence of a particular set (X 1 X 2 ...X m ) involves 
the A’s of the set, with certain constants. In a large class 
of cases the constants depend on the mean values of the A’s 
and the mean squares and products of their deviations 
from their respective means. It is to these cases that the 
new notation is specially applicable. It may be that some 
of the A’s are drawn from the same field ; we shall proceed 
as if the fields were all different, but this does not affect 
the validity of the reasoning. 

(ii) We shall only consider two kinds of cases. 

(a) The first kind of case is where our statistical 
information relates to a large number of individuals, and 
A x A 2 A 3 ...are the measures of specified attributes, such 
as height, head-length, chest-expansion, intelligence, etc., 

* I have dealt with the problems of this chapter in as general 
terms as possible. The explanations in small print may help to 
show the statistical student the way in which the problems 
actually arise. 


IX. 1 (iii) 


93 


Preliminary 

of any individual. Here the 1 variability ’ of any X has 
reference to the different values that it takes for different 
individuals. The frequency of joint occurrence of particu- 
lar values of iljTjXg... may be a complicated function of 
these values and of certain constants which are to he 
determined. 

{b) The other kind of case is that in which the ques- 
tion is one of 1 graduation ’ or 1 fitting Here Aj X 2 X 3 . .. 
are observed values of different quantities (e.g. rates of 
mortality at successive ages), or aie the results of observa- 
tion of one quantity at different times or by different 
observers. The 1 variability’ of any particular X lies in the 
fact that the observed value contains an unknown error ; 
and our treatment is based on the assumption that a rela- 
tion of a particular kind holds between the true A’s. 

In cases of this latter kind it should be noticed that 
the only things which we treat as variables are the errors 
in the X’s. We may take, as the typical case, the observed 
rates of mortality Aj X 2 X 3 . , . at ages t l 1 2 If we 

denote the true rates by then the assumption 

which we make is really an assumption that f is a certain 
function, with constants to be determined, of t. So far as 
this function is concerned, f and t might be called variables. 
But, for our purpose, they are not variables. We are con- 
cerned with the fixed values^ t 2 t 3 ... and the corresponding 
fixed, though unknown, values f, £„ ; the real variables 

are the differences between the observed values X 1 X 2 X 3 ... 
and the true values £ 2 

(iii) There are two reasons why the same mathematical methods 
apply to subjects so different as relativity and statistical theory. 
One is that the number of X’s in a statistical set may be very large : 
in the second kind of case mentioned in (ii) it may be as many as 
20 or 30. The need of a condensed notation is therefore even 


94 


Preliminary IX. 1 (iii) 

greater than for relativity, where the number of dimensions does 
not exceed 4. The other reason is that, as in the relativity theory, 
we are, to a certain extent in the first kind of case, and very 
largely in the second kind of case, concerned with sets constructed 
from the original sets by means of linear relations. 

(iv) We denote the mean product of the deviations of 
X q and X r by (X tJ . X r ), or, more briefly, by f qr ; i.e. 

fq r — (X q . X r ) = mean value of 

(X q — mean X q ) (X r — mean X r ). 

It must be clearly understood that, though (X q .X T ) 
depends on q and r, it relates to the complete fields 
from which X (J and X r are drawn, and, for any par- 
ticular values of q and r, is not a variable, like X q and X r , 
but a constant. 

(v) Although we have defined (X q .X r ) as a mean value, 
our dealings with it ultimately depend on the algebraical 
laws which it follows. These are, first that it satisfies the 
ordinary laws of multiplication of two expressions X q and 
X r , i.e. that, c being a constant as regards the A’s, 

(A q .X r ) = (A r . Xq), (Xq . (A r + A s )) = (A . X r ) + (X q . A g ), 

(Aq. cX r ) = c(X q . X r ) ; (IX. 1.1) 
and next that, if u is any linear function of the A’s, then 
( u.u ) is positive unless u — 0, i.e. that 

(u.u)> 0 if u 0. (IX. 1. 2) 

It is clear that (u.u) = 0 if u— 0; for (IX. 1. 1) gives, by 
putting c = 0, 

(X q . 0) = 0, (IX. 1.3) 

whence («.0) =0 follows by means of (IX. 1. 1) (cf. (vi) 
below). These are the only properties we shall use; and 
our results will therefore be true for any meaning of 
(A . A’,) that satisfies these laws, provided, of course, that 


IX. 1 (viii) 


95 


Preliminary 

(X q . X r ) is a constant as regards all the A’s. And, con- 
versely, we shall only be dealing with sets for which 
(Xq.X r ) has a meaning and a value for each value of q with 
each value of r, and satisfies these laws. It will be assumed 
that the values of (X q .X r ) are known ; or, at any rate, that 
our results are final when expressed in terms of these 
values. Whether we are dealing with mean squares or 
mean products or not, we can call {X^.X r ) the ( . ) of A 
and X r . 

In the first kind of case mentioned in (ii) (X q . X q ) would usually 
be the mean square of deviation of X q from its mean, i. e. would 
be the square of the standard deviation ; and (X q . X,.) would be 
the mean product of deviations of X q and X,. from their respective 
means, and would therefore, in the case of normal correlation, be 
equal to the product of the standard deviations of X rj and X,. 
multiplied by their coefficient of correlation. In the second kind 
of case (X q . X q ) is the mean square of error of X q , and (X q . X,.) is 
the mean product of errors of X q and X,.. 

(vi) It follows from (IX. 1. 1) that 

' X r) = %(- ■ X r) . (V V M • 6 p X p) = a M /j p( X M ' X f>) ' 

(IX. 1. 4) 

(vii) If X\ is a set of the kind considered in this section 
then so also is any other set which is a linear function 
of A v Suppose, for instance, that 

■p ^ p 

Then, by (IX. 1. 4), 

(Y'l . p) = (/Pix K . rx v ) = b x n vr (X K . X v ), 

w'hich has a definite meaning for each value of q with each 
value of r, and can be shown to satisfy the laws stated in (v). 

(viii) Our results are also subject to the condition that 
none of the determinants of the double sets we have to 
deal with are 0 ; i.e. that | (A” . A’,.) [ ^ 0, whether the range 


96 


Preliminary 


IX. 1 (viii) 


of values of q and r is the whole of the range 1 to m or 
a part of it only. 

IX. 2. Mean-product set.— (i) The quantities (J . X r ) 
constitute a symmetrical double set 



(IX. 2. A) 


We call this the mean-produet set. 

(ii) Corresponding to this there is a reciprocal set 


f pp = f pp given by 


=/'% =f ?X f Pl „ = I*- (IX. 2. 1) 


(iii) If (X p . X s ) — 0, we can for the purpose of this 
chapter describe X p and X s as statistically independent. 
Strictly speaking, this is a loose description, since the 
complete statistical independence of two variables X p and 
X s would imply a good deal more than that the mean 
product of their deviations from their respective means 
should be 0. But we are only concerned, here, with mean 
squares and mean products. 

(iv) The simplest class of cases — from the point of view 
of algebraical treatment — consists of those cases in which 
the A’’s are statistically independent of one another and the 
mean square of deviation of each X is 1. We can express 
this by duplicating the set, thus : 



and saying that the ( . ) of corresponding elements of the 
two sets is 1, and that the (.) of elements which do not 
correspond is 0. 


IX. 3 (ii) 


97 


Mean-product set 

For a set of this kind we have 

f qr = {X q .X r ) = \l 
so that the mean-product set is the unit set. It follows 
that in this class of cases the mean-product set and its 
reciprocal set are identical. 

(v) The next kind of case, in point of simplicity, is that in 
which the X’s are statistically independent of one another but the 
mean squares of deviation are not all 1. This would be the case, 
for instance, if the X's were independent observations, not all of 
the same weight, of a single quantity. For practical purposes 
a case of this kind can be brought under (iv) by expressing each 
X in terms of its standard deviation (square root of mean square of 
deviation) as the unit. 

(vi) There is also an important class of cases in which the A'’s 
fall into two groups, such that each X in one group is statistically 
independent of each X in the other group. If, as in VI. 11 (i), we 
denote the two groups by A a and A then the property is that 

(A a . A<p) = 0. 

IX. 3. Conjugate sets. — (i) When a set X K is not of 
the simple kind described in § 2 (iv), we shall find it useful 
to introduce another set X K which ( 1 ) is a linear function of 
X k and (2) is such that, if we place the sets opposite one 
another, thus : 

W. . . . X rn 

A' 1 A 2 A 3 ... A™ 

the ( .) of corresponding elements of the two sets is 1, and 
that of elements which do not correspond is 0. This new 
set X k is said to be conjugate to A a . 


(ii) The second of the above conditions can be written 
(AP.A fl ) =|P, (IX. 3.1) 


98 


Conjugate sets 


IX. 3 (ii) 


or 

(- 1A -- Y m ) = |£- (IX. 3. 2) 

(iii) Each element Xt' of the new set will contain m 
terms, with m coefficients which have to be determined 
from the m equations given by 

(^.a;) = |£. 

There are altogether m 2 equations to determine A A . By 
regrouping these according to the values of p in (A A . X ), 
we see that if X x is conjugate to X x then X x is conjugate to 
A*. This is in fact evident from the symmetry of (IX. 3. 2). 

(iv) To express A a in terms of A’ x , or X K in terms of A A , 
let us first take W to be any linear function of X x , say 

W=aXX^. (1) 

Then we want to find an expression for (C . 

As we know the value of (X A . X ), we take the ( . ) of 
W and A A . By (IX. 1. 4) and (IX. 3. 2) we find that 

( r . x A ) = (/a; . a a ) = ^(a; . a a ) = «* | A = «\ 

whence 

a? = (tr.x^). 

Substituting in (1), 

W= (W.X^X^. (IX. 3. 3) 

Taking W to be each element of A" A in turn, we have 

X A = (A a .A^)A m . (IX. 3.4) 

We do not yet know the values of (A’ A . A 7 '). But, if we had 
started with W as a linear function of X A , we should simi- 
larly have got 

ir= (lr.xjxr, (ix. 3 . 5 ) 

whence 

X x = (X K .XJXX (IX. 3. e) 


IX. 3 (vii) Conjugate sets 

99 

Writing this in the form 


(IX. 3. 7) 

we have, by reciprocation, 

A 71 X K , 

(IX. 3. 8) 

which gives A M in terms of X K . 

Further, comparing 


(IX. 3. 8) with (IX. 3. 4), we see that 

If (A* . i;) =A m , then (A* . X") =/** ; (IX. 3. 9) 
and, of course, the converse also holds. This result is 
dependent on the assumption, made in § 1 (viii), that 
| f qr | is not 0. 

(v) We could have obtained (IX. 3. 4) and (IX. 3. 6) in 
fewer steps by considering the set as a whole instead of 
element by element. If we assume 

X* = a* a;, 

then we get 

(A* . A") = (a^X M . AT) = | ' = a Xv , 

so that 

a** = (X* . A 7 '). 

This gives (IX. 3. 4) ; and (IX. 3. 6) can be obtained in 
the same way. 

(vi) We can write (IX. 3. 3) in the form 

Tf/Xp = (W. A 71 ) ; (IX. 3. 10) 

and similarly from (IX. 3. 5) 

W/X* = (IF. A m ) . (IX. 3. 11) 

(vii) If the set A' x is of the special kind considered in 
§ 2 (iv), i.e. is such that 

(a*.a;) = |*, 

then 

A'^^.A^A^I^A^AA 


100 Conjugate sets IX. 3 (vii) 

so that the set is identical with its conjugate. The set is 
said to be self-conjugate. 

(viii) In a case of the kind mentioned in § 2 (v), where f pq = 0 
if q^p, but fpp is not necessarily =1, it may be shown that 
f VP = \/f pp , and fvi = 0 if q ^ p, so that Xp = X p /f pp . 

(ix) Next consider a case of the kind mentioned in § 2 (vi), 
where X x consists of two portions, the elements in each portion 
being independent of those in the other portion. As before, we 
take one portion to consist of the first k elements, and the other of 
the remaining m — k, and we denote the two portions by X a and 
X t p. Then the special property is that 

faip ^ (-^a • = 0, (1) 

whence, as in YI. 11 (iii), it follows that 

= 0 . ( 2 ) 

Breaking up the right-hand side of (IX. 3. 7) into two portions, we 
get, according as X belongs to the first or to the second portion, 
the two separate results 

X a =fayX y , X,p =f'p ] f,X't / . (3) 

Similarly, from (IX. 3. 8), 

X a =/“'>' X 7 , X* =/M%. (4) 

In finding/ 11 ''' and from f Xp , it is (see VI. 11 (iii)) immaterial 

whether we take the set f Kp as a whole or the sets f ay and 
separately, so that (4) may equally well be written (see VI. 11 (ii)) 
= (/“% V 7 , X* = IfWlm-kX*- (5) 

IX. 4. Conjugate sets with linear relations.— 

(i) Let Y p be any set which is a linear function of X p , and 
therefore of X h ; and let Y p be its conjugate set. Then, 
taking W in (IX. 3. 10) to be each element of Y p in turn, 
we have 

Y p /X, = (Y p .W). (IX. 4.1) 

Similarly 

Y p /X» = (Y p . XJ, Y»/X p = (r. X»), T/XX = (Y p . X). 

(IX. 4. 2) 


IX. 4 (v) Conjugate sets with linear relations 101 


(ii) By combining ratios, we get such results as 

(K ■ W) = E'/Xp = EJY p . Y p /X p = (E a . Y p )(Y p . X p ) . 

(IX. 4. 3) 

(iii) To find Y p , suppose that 
Let the conjugate set be 


Then 


v = k p ^xr 


IS = ( yP -K) = (k ppi X».b av X v ) = I :i pii b cv = 

whence, by reciprocation, 

^ = ns = ^. 

Thus the conjugate set is 

P=4 W * X*. (2) 

(iv) Similarly, if 


then 


Y( b (v c vp d pt j\ (j , 


Y e = b (v c vp d pa X a , 

etc. ; in other words, the conjugate of an inner product is the 
inner product of the | rceipioeals | of ^ f actor8 _ [L e ^ us write 


Then 


1 conjugate ) 

C e = b^c^ e d 9x X x . 

(CY Y s ) = Vn sv c^c vp d ex d pa (X x .X a ) 

= |L 


since (X x .X a ) = I x . Hence 

C f — Y ( . 

We might, alternatively, have deduced this from (iii) by means of 
(VI. 10. 2).] 


(v) From (IX. 4. 1) 

= ( Y p • - Y ") = • y P ) = x V yp ; 

and therefore, by VII. 3 (iv), 

Y p Y p = XX p . (IX. 4. 4) 

Thus conjugate sets are contragredient (VII. 4) ; and the 


102 


Conjugate sets with linear relations IX. 4 (v) 

inner product of a set and its conjugate is the same for 
all linearly related sets. If we denote this inner product 
by Q, then the sets A' M , I j, I 7 * are connected by four 

relations of the form 

—e=Lp_(X Y ) = • fix. 4. 5) 

We can express Q in such forms as 

Q = A; x* = (a; • a;) a" = (a* . x*) x K a;, 

the last of which, when written out in full, is 
f n X\X 1 + 2 f li X 1 X 2 + / 22 A' 2 A 2 + 2 / ' 3 A' x X 3 + ... +f mm X m X m , 
or in more general forms such as 

Q = (C\IT)C x J) /l , 

where C K and D K are any linear functions of A' v It must 
be remembered that the invariability of Q only applies for 
the particular values X lt X 2 , . . . X m . If there were a dif- 
ferent set of A’s there would (in general) be a different Q. 

IX. 5. The frequency-quadratic. — (i) In most of the 
cases we are considering, whether of the first or of the second 
kind mentioned in § 1 (ii), the frequency of joint occur- 
rence of values lying within limits 

A 1 + \d Aj, A g + £ (l A 2 , . . . A m + ^ (l X m 
is proportional to 

e-i p dX 1 dX 2 ...dX m , 

where, if x v x 2 , . . . x m are the deviations of X v X 2 . . . A m 
from their respective means, P is of the form 
P =a n x l x l + 2a n x l x 2 + d lt x 2 x 2 + 2a} z x l x 3 + ... + a mm x m x m . ( 1 ) 
In our notation this becomes 

P = a^x^x ^ ; 


where 


= a M . 


( 2 ) 

( 3 ) 


IX. 5 (iv) 


The frequency-quadratic 103 

(ii) Let us write 

E x = a^x^ = a^x^+f'x., + ... + a xm x m 

= ^T/ix K . ( 4 ) 

Then it can be shown (see VIII. 4 (iii) ) that 

mean value of E p x„ = P — . (5) 

2 (0 lfq&pi ’ 

Taking- (B p . C ( f to mean, for these cases, the mean product 
of Bp and C 2 , it follows from (5) that E x is conjugate to 
x x> i- e. 

B K = x x . (6) 

Hence, by (4), 

** = 

and therefore, by (IX. 3. 10), or by comparison with 
(IX. 3.4), 

a Xtx = x x /Xp - (x x . x p ). (7) 

(iii) Thus the a’s in the expression for P given in (1) 

are the mean squares and mean products of the elements of 
the conjugate set. Similarly, if we expressed P in terms 
of the conjugate set, the coefficients would be the mean 
squares and mean products of the elements of the original 


set; i.e. 


P = a^x^Xp = i\x x = a Kli x x x u 

(8) 

= a jjada; 1 + 2 a 1 „x 1 x 2 + a 2 „x' 2 x 2 + 2 a^x'a? + . . 

where 


> 

II 

> 

(9) 


(iv) Take, for example, the case of two quantities X, , A';, whose 
standard deviations aDd coefficient of correlation are c 2 , and r. 
Then it is well known that 


104 


IX. 5 (iv) 


The frequency-quadratic 


This gives, for the members of the conjugate set, 

^ = (*_«•) x _±-. = + _L 

Vq c 2 / c^l-rr) \ c, c 2 / c 2 (4 - 


rr) 


It is easily verified that the mean square of x x is equal to the 
coefficient of XjX, in P, and so on. 


(v) If we express the x’s linearly in terms of a new set 
y K , the value of P will remain unaltered. We can put this 
differently as follows. Suppose that y K is any linear func- 
tion of x K . Let k\=(y^ -yf ) ; and let // xpi be the reciprocal 
of h Xll . Then, if we write 

/ - ^ p y P , 

we shall have 

V X Vx = P = x X x K - 


(vi) The (x q . x q ) or (x q . x r ) which we have so far been 
considering is the mean square of x q , or the mean product 
of x q and x r , without regard to the values that each of the 
other x’s may have ; i. e. the mean square or mean product 
taken for all possible values of these other x’s according to 
their relative frequencies. We may also want to know 
what happens when some of the x’s have definite values 
ascribed to them and are not allowed to vary from these 
values. In these eases we follow the principle of VI. 1 1 (ii). 
Suppose that all the x’s after x k are fixed. Let the x's up 
to x k be denoted by x a or x 0 etc. Then our methods apply 
to the set of order k formed by these afs. The principle, 
therefore, is as follows. Suppose we want to study the 
variation of the k quantities x a when the m — k quantities 
x q are fixed. W e first construct the mean-product set f ^ 
( = a Mp ) for the m quantities x a and x^ ; then construct the 
reciprocal set f^ p ( — a hp ), the elements of which are the 
coefficients in the terms in P ; then take out the partial set 
f ay corresponding to x a ; and then find the set (f iy )k which 
is the reciprocal of f ay . The result is the mean-product 


IX. 6 (i) The frequency-quadratic 105 

set of x a when x ifi is fixed. (This is a well-known theorem, 
but is usually expressed in terms of determinants.) 

(vii) If the partial sets x a and x^ in (vi) are independent, 
so that ( x a . x^) = 0, the elements of the mean-product set 
outside the portions corresponding to ( x a . x y ) and . x^) 
will all be 0, and (see § 3 (ix)) the values of (f.f will be 
the same whether we construct them from the whole set 
or from the partial set. This is otherwise obvious ; for, if 
the x’s of x a vary independently of those of x i/p they vary 
in the same way whether the latter are fixed or vary. 

IX. 6. Criteria for improved values. — (i) Our next 
problem, considered in this and the following section, is that 
of reduction of error, in a case of the second kind mentioned 
in § 1 (ii). We have a set of quantities 

A = (A A A-AJ 

which contain errors ; the mean products of error being 

A, = (A-A)- (IX. 6. A) 

The whole set D x consists of two portions 

A = (A A —A). A = (A+ 1 A+2---A)- 

All the D s are the results (direct or indirect) of observa- 
tion ; but the true values of the A> are negligible (within 
the degree of accuracy to which we are working), and, if U 
is any one of the D's, or any linear function of them, we 
can add to it any linear function* of D ^ without altering, 
except to a negligible extent, the true value which it repre- 
sents. If the sum of U and an indeterminate linear func- 
tion of A is represented by U', the problem of reduction of 
error is to determine this linear function so as to make 

* This is, of course, an incomplete statement. We could replace 
V by any function of XJ and which would be equal to U if the 
■A were all 0. But we are only considering linear functions. 


106 Criteria for improved values IX. 6 (i) 

(U'. U ') a minimum. The resulting value of U' is called 
the improved value of U, and will be denoted by 1 U. The 
elements of are called the auxiliaries. We can replace 
a by /3, y, . . ., and <f> by Vb • • •> as occasion requires. 

(ii) The way in which this problem arises is as follows. We 
start with a set of observed quantities X lt X 2 , . . . X m , which 
correspond to a series of values of some other quantity t at equal 
or unequal intervals ; the A'’s might, e.g., be rates of mortality at 
different ages. The X’s contain errors; and our fundamental 
assumption, based on general experience and on inspection of the 
particular data, is that the true values are so nearly of the form 
(in ordinary notation) c„ + t + c 2 t 2 + . . . +c k _ x t k ~ l that their 
differences (divided differences if the values of t are at unequal 
intervals) after the (A — l)th are negligible. We may therefore add 
to each X any linear function of these differences, which are what 
we are calling D k+l , D k+i , . . . D m . The problem is to determine 
the coefficients in this linear function so that the mean square of 
error of the sum of the X and the linear function shall be 
a minimum. 

(iii) We have first to see what relations hold between 
the two portions of D K and the two portions of its con- 
jugate set when similarly divided. Denoting the con- 
jugate set, as usual, by i/, let the two portions be 

1T = {I) X D 2 ...D ,: ), I)' p = (D ,:+l D h+2 ,..I) m ). 

Here it is to be observed that D a is not (in general) 
the set (order k) conjugate to the set D a (order k), 
since each element of it is a linear function of the 
whole D k ; and similarly for Lfi. Now the condition of 
conjugacy is 

{^.D q )=\‘ r 

But, if 1)1 1 and l) q belong to non-corresponding portions 
of the two sets, q cannot be equal to p. Hence we get 
the relations 


107 


IX. 6 (vi) Criteria for improved values 

{D\B,)r= |«...(i), [L a .Df) = 0...(2), 

(Df.D a ) = 0...(3), (2*.2> x ) = |$...<4). 

(iv) First let 1/ be an element of .Z^ or a linear func- 
tion of 1)^, say Then its improved value must be 

U-U = 0; i.e. 

l{a*D <p ) = 0. (IX. 6. l) 

For this makes ( V . V') = (0 . 0) = 0, by (IX. 1.3); and, by 
(IX. 1. 2), ( V . V) would be > 0 if U' were not = 0. Hence 
(I/'. U') is a minimum when U' = 0. 

(v) The next most simple case is that in which U is an 
element of I) a or a linear function of Z>“, say 

U=a a I)\ (1) 

Let the value of U' be 

V = 17+#, 

where 

u = cftD'p. (2) 

Then, by (IX. 1. 1), 

(V . V) = (U. U) + 2(U.«) + (u.«). 

But, by (1) and (2), and by (2) of (iii) above, 

(U.n) = a a a*(Tr.D (l> ) = 0. 

Hence {V . V) is a minimum when ( u . u) is a minimum ; 
and this, by (IX. 1. 2), is when n = 0, so that 

r = u. 

Hence the improved value is the same as the original value ; 

1 {a a lf) = a a l) a . (IX. 6.2) 

(vi) The simplicity of the results obtained in (iv) and (v) 
suggests that we should in all cases regard U as expressed 
in terms of l) a and 1),. There is no difficulty about this, 


108 Criteria for improved values IX. 6 (vi) 

in theory, whatever linear function of the D's V may be. 
If, for instance, TJ is given as a linear function of D k , then 
we obtain our result by eliminating D a {k values) between 
this formula for JJ and the k equations which give D a in 
terms of D x , i. e. in terms of D a and D Suppose then 
that 

U = F+ W, 

where 

V = b a I)\ W=b*_D <p . 

Then U' is formed from U by adding - some linear func- 
tion of D p , so that 

U' = F+ W', 

where V — b a //' as before, and W' is of the form 
W = c*B r 

Hence 

(U'.U') = ( V . V) + 2{V . W')+{W'. W). 

But 

(r. W) = (bjr . c+dj = bjfir . ]j p ) = o , 

by (2) of (iii). Hence 

(U'. U') = ( V . V) + (W'.W'). 

But this is a minimum when W' = 0. Hence 

7(5 a il a + ^) = b a D\ (IX. 6.3) 

In other words, if we express U in terms of Jf and 
the improved value of U is found by omitting the part 
involving D p . 

(vii) Since, by (IX. 6. 3), I U is a linear function of D a , 
and, by (2) of (iii), (B a . Df) = 0, it follows, by (IX. 1. 4), 
that 

= 0. (IX. 0.4) 

In other words, the ( . ) of any improved value and each 
of the auxiliaries is 0. 


IX. 6 (xii) Criteria for improved values 


109 


(viii) It also follows from (IX. 6. 3) that if two quantities 
differ by a linear function of the auxiliaries they have the 
same improved value. 

(ix) By taking U in (vi) to be each member, in turn, 
of a set B K of linear functions of the B’s expressed in the 


form 



we find that 
Also 



i(k K A) = i^b^+^Dfi = = pib k -. 


(ix. e. 5) 


i. e. the improved value of any linear function of the B's is 
the same linear function of their improved values. 

(x) Altering k K in (IX. 6. 5) to kf and writing = 
k^ we find that 



i. e. the improved values of two linearly connected sets are 
related in the same way as the original sets ; or, more 
briefly, a set and its improved values are cogredient. 

(xi) Since we know that the improved values of B^ 
are 0, we have really only to determine those of k other 
quantities. In view of (IX. 6. 5), we can choose these to 
he any linear functions of I) a that we like, with or 
without linear functions of B ^ added ; and similarly we 
can replace B^ by any linear functions of B (f> : provided, 
in both cases, that none of the mean-product determinants 
are 0. The functions so chosen can be called B a and 
B^, so that we need only consider the problem of find- 
ing IB a . 

(xii) The result stated in (IX. 6. 2) gives us the extension, to the 
general case in which the errors of the original observations may 


110 


Criteria for improved values IX. 6 (xii) 


have any mean squares and mean products, of the ‘method of 
moments’ ordinarily applied to the case of a self-conjugate set 
(§ 3 (vii)). We take X x , as in (ii), to be the original observations, 
and D x to be their differences of successive orders. Then we have 
found that the improved value of any linear function of D a is the 
same as the original value. But, by VII. 3 (v), JJ l , If, . . . T) k are 
successive sums of the elements of X x , the set conjugate to X x ; 
and the first k moments of X x are linear functions of these sums. 
Hence the improved values of these moments are the same as their 
original values ; and this, by (ix), is the same thing as saying that 
the moments of the improved values of X x are equal to the 
moments of the original values. Thus the method of moments 
still applies. But it should be observed that it does not apply to 
the original set of observations, but to the conjugate set. 

As a simple example, suppose that X x is a set of independent 
observations of a single quantity, the mean square of error of X p 
being c p c p . Then (§ 3 (viii)) the conjugate set is 



As the X's will all have the same improved value, which we will 
call IX, there is only one moment to be considered, namely, the 
Oth moment, or sum, of the conjugate set. Hence, equating the 
sums of original and of improved values, 



which gives the familiar result. 

IX. 7. Determination of improved values. — 

(i) From the results obtained in the preceding section we 
deduce three methods of finding the improved value of any 
element of D a , say Dj. 

(1) We can express Dj in terms of I) a and U <fi , and then 
omit the part involving D ^ The result is Wf. 

(2) We can say that 11) f is some linear function of If 1 . 
This linear function has k coefficients to be determined ; 
they are determined by the condition that, if the linear 


IX. 7 (ii) Determination of improved values 111 

function is expressed in terms of D K , the coefficient of 
I)j is 1 and those of other elements of D a are 0. The 
practical application of this method depends on the cir- 
cumstances of the particular class of cases. 

(3) We can say that lDj is obtained from I) f - hy adding 
a linear function of D ip , which we have called — D ip . 
This linear function has m — k coefficients to be determined. 
We have found in (IX. 6. 4) that {IDj. Df — 0 ; this gives 
ru — k equations, from which the coefficients in question can 
be determined. Thus the necessary and sufficient conditions 
for II) f are. that it differs from Df by a linear function of 
1) ? and that ( ID j . Df — 0. 

These three methods are exhibited in (ii), (iv), and (v) 
below, and a fourth method is given in (vii). 

(ii) To apply the first method, let us write 
D a = e aB D° + e a +D r 

We do not need c a,, \ since the only part of D a that counts 
for the improved value is e aB 1 /' ; we therefore get rid of 
e a, ' > at once, by means of something whose ( . Df is 0. 
This, by (2) of § 6 (iii), is D 1 . Taking the (D y . ) of both 
sides, we have 

(DfD a ) = e a fDfD B ) = e aB d‘ 3 \ 

Also, by (1) of § 6 (iii), 

{D^.D a ) = \l. 

Hence 

= Ig- 
nore a, /3, y relate to the partial set ( D 1 D 2 . . . D k ), 
and the statement is limited to this set. Dealing only 
with this set, let us denote the reciprocal of by 
(d B ; this, as pointed out in VI. 1 f (ii) (cf. § 5 (vi) of the 
present chapter) is not ordinarily the same thing as d B as 


112 Determination of improved values IX. 7 (ii) 

obtained from the whole set of order m. We have then, 
by reciprocation, 

e aB ~ (^/3 7 )fcl2 = 

Substituting 1 in the expression for D a , and dropping the 
e ai '’ in order to get the improved value, we have 

IV a = (d Ba ) k lf. (IX. 7.1) 

(iii) Although, in the above, we have not needed e a<f> , we 
ought to find its value in order to satisfy ourselves that, 
as has been stated in § 6 (vi), any linear function of D K , 
say ff a D a +ff t pD t p, can be expressed as a linear function of 
D a and ; to do this, it is only necessary to prove the 
proposition for D a , since the formula for g a D a + will 
follow at once. 

We have written 

A = e aB D B + e^D r 

and have found e aB . To find e a) ' , \ we must get rid of the 
first term ; so we again use (2) of § 6 (iii), getting 

(D a .LJ = f+(n*.D x ) 
or _ _ d ax = e^d^. 

Hence, by reciprocation, 

e a *=[d*x] m _ k d ax , 

where is the reciprocal of d^ x obtained from the 

partial set d) Jt+2 . . . D m ). The complete expression 

for D a is therefore 

K = (IX. 7. 2) 

The existence of (d Ba )j c and is dependent on the 

assumption that the determinant | d qr | formed for D a , 
and the determinant | d qr | formed for D are both ^ 0 
(see § 1 (viii)). 


IX. 7 (vi) Determination of improved, values 113 


(iv) To use the second method, we might have proceeded 
as follows. We write 

ID a = e a ,Tf. 

To find e a p, we express if in terms of I ) M , i.e. of D y and 1^, 
by means of (IX. 3. 4), and we have 

= e aB fyi) y + e a0 d^IJ r 

From the condition stated in (2) of (i), it follows that 

e aB^= IL 

and therefore, by reciprocation, 

e aB ~ (dpy) k 1 1 = (d Ba ) k . 

Hence we get the same result as before, namely, 

U>a = WulP- 


(v) For the third method, we write 


II = D~e a,p D„ 


and we have to find e a,p . The condition stated in (3) of (i), 
namely, 


(ID a .D x )=0, 


gives 


d ax = (D a .I x ) = e a fD tp .D x ) = e^d^. 

X- By reciprocation 


a X — ’ ~X 

This is true for all the m — k values of 


This agrees with (IX. 7. 1) and (IX. 7. 2). As m — k will 
usually be a good deal greater than k, the method is rather 
of theoretical than of practical interest. 


(vi) The elements which we have found to be important 
in the above processes are the m—k auxiliaries Zh, whose 
improved values are all 0, and the k elements D a of the 


114 Determination of improved values IX. 7 (vi) 

conjugate set which correspond to the remainder of D x . 
These elements together constitute a set of order m ; and 
we have in fact, in (iii), expressed D a in terms of this 
set. As the set is important, it is worth while to see 
what is its conjugate. 

We write 

= If & B p , 

where the 1 & ’ means that the elements of the two sets of 
orders k and ru — k are combined to form a set of order in. 
These two partial sets are statistically independent. It 
follows, by § 3 (ix), that the set conjugate to Ff is 

K = (doth# & [^ x ].a-^r 

(vii) But (d a8 and d Sa being identical) we have already 
found that 

u>* = (ffkK- 

Hence we get a concise formula for finding lD a . Let the 
set conjugate to D a & I) p he D a & If ; and let the set 
conjugate to If & I) pi he F n & If. Then F a = ID a . 

(viii) Since II) a is of the form D a — e a ' t ‘JD p , and ID 0 is of 
the form e B ^ If . and {If . F)f) = 0, it follows that 
(iD n .ii) B ) = (n a .Ti) 0 ), 

and similarly 

(ID a .ID B ) = (ID a .D B ). 

[Note.— This chapter is based on (1) a paper by myself in Phil. 
Trans. (1920), ser. A, vol. 221, pp. 199-237, in which the old nota- 
tion was used ; (2) a paper by Professor Eddington in Proceedings 
of the London Mathematical Society, ser. 2, vol. 20, pp. 213-221, show- 
ing how the notation and methods of the tensor calculus can be 
applied, and making some abbreviations and improvements in the 
work ; and (3) a note by myself, following the above, ibid., pp. 222- 
224. I have altered the notation a good deal.] 


X. TENSORS IN THEORY OF RELATIVITY 


X. 1. Preliminary. — (i) Tensors are sets* which (1) are 
functions of a set of co-ordinates (aq x i x. i ...) and (2) are 
subject to certain conditions of transformation when the 
co-ordinates are transformed. 

(ii) In the theory of relativity there are four co-ordinates 
(aq x 2 x 3 aq), so that all the sets are of order 4, and any inner 
multiplication with regard to a suffix /x involves addition 
of the products for the values 1, 2, 3, 4 of \x. But this fixing 
of the number of elements in a set does not affect the 
general reasoning with regard to the sets, and we can 
continue to treat them as of order in. 

(iii) In VII. 4 we started with a set X K , and a set A K 
which is a definite function of A’\ and we supposed A k to 
be changed as the result of X K being changed by linear 
substitution ; and the cases we considered were those in 
which, throughout all such changes, A K varies either directly 
or reciprocally as A'\ In VII. 7 we extended the inquiry 
by taking a set SI to be a function of two or more single 
sets, and considered cases in which, when these sets are 
changed by linear substitution, SI varies directly or reci- 
procally as each of the sets. For tensors we have tc 
consider cases in which the primary substitutions are not 
necessarily linear. If in place of the original set of co- 
ordinates x K we take a new set x\, which is a function 
but not necessarily a linear function of x K , the ratio which 

* It must be remembered that the elements of a set are not 
necessarily numbers, but may be quantities ; and that a set as 
a whole is something different from its elements. What we 
usually mean by a tensor is some physical phenomenon represented 
by a set : but no confusion arises if we call the set itself a tensor. 


116 


Preliminary X. 1 (iii) 

we have now to consider is not the ratio of x' to x , but 

P A 

the ratio of their differentials, i.e. the partial differential 
coefficient ’bx' p /7)x x . When the substitution is linear, this 
is equal* to x' p /x x , so that our treatment of the general 
case is consistent with our previous treatment of the par- 
ticular case. 

We will begin with the single set, and then go on to sets 
of higher rank. 

(iv) In the case of sets of higher rank, we sometimes 
have to deal not only with inner products but also with 
inner sums. By the inner sum, in such an expression as 
A^ VIJ , we mean the result obtained by replacing p by a 
and summing the values for <r= 1, 2, 3, 4. It will be seen 
presently (§ 5 (iv)) that, as the result of the particular 
notation adopted, the inner products or sums have only to 
be considered when one of the two letters concerned is an 
upper suffix and the other is a lower suffix. 

X. 2. Single sets (vectors). — (i) Beginning with single 
sets, we start with a pair which we call x x (the set of 
co-ordinates) and A K , or x a and A a ; A x or A a being a func- 
tion of x x or x a . (The a here, like the A, denotes a complete 
set, not, as in Chapter IX, a partial set.) Connected with 
these, or arising out of them, is a plurality of pairs x\ and A ,K . 
But our purview is limited to the cases in which the relation 
between A x and A ,k is linear, and in which, further, this 
linear relation is of one of the two following kinds : 

(a) where 

A ,k 

A* ~PP a : 

* The difficulty mentioned in the note to VIII. 1 (ii) does not 
arise, because <*x' p /i)x x does not occur absolutely, but (directly or 
reciprocally) as one of the factors in inner multiplication with 
regard to A or p. 


117 


X. 3 (i) Single sets ( vectors ) 

( b ) — replacing A K and A' k by A k and A\ — where 

A = ^ 

A a 

In the cases under (a) A K is said to be a contravariant 
vector ; in the cases under ( b ) A K is said to be a covariant 
vector. Here ‘vector’ is used as meaning a single set 
which is a tensor. 


(ii) It will be seen from VIII. 1 (ii) that, if the relation 
between x x and x\ is linear, these become respectively 



so that in these particular classes of cases A x is contravariant 
if A k and x x are cogredient, and A x is covariant if A k and x K 
are contragredient .* 


X. 3. Other sets. — (i) For sets of higher rank, we are 
similarly concerned with pairs of partial derivatives 


t>x a 


and 




t^Xa 


and 


* X B 

ix'.. 


etc.; and a set depending on x a ,x e ,... is not a tensor unless, 
when x a x B ... become x\x'^... , the ‘ratio’ of the new 
value of the set to the old value is the product of these 
partial derivatives, one from each pair. The particular 
derivatives are indicated by the position of the letters 
a/3... or g.v...: these are upper suffixes if, so far as the 
particular variable is concerned, the relation is of the 


* It seems desirable to call attention to these classes of cases, as 
otherwise the tensor terminology may be found rather confusing. 


118 


Other sets 


X. 3 (i) 


contravariant type, and lower suffixes if the relation is 
of the covariant type. Thus for double sets (tensors of the 
second rank) we should have such relations as 


A'^ 


i)x a Dx b 


(contravariant tensor), 


__ ( covar j an t tensor), 

A aS ** K p 

JJ=&J'^ (mixedtens " )l 

and for a tensor of higher rank we might have such a rela- 
tion as 

A \pv p 

A 5 a8y X p ‘ >X v <>T S 

Two tensors SI and 15 are said to be of the same character 
if the ratios SL'/Sl and 15715 are of the same form. 


(ii) For a scalar function of a set or sets the above 
condition becomes 

A' = A, 

so that a scalar (in the general sense) is not a tensor unless 
it remains constant for all changes in the system of co- 
ordinates. Such a function as A l + A 2 + A„ + A ± , for instance, 
would not usually be a tensor. For tensor purposes, there- 
fore, ‘scalar’ practically means ‘invariant’. 


X. 4. Reason for limitation. — The object of limiting 
the definition of ‘ tensor ’ in this way is to ensure that the 
result of any number of steps, all of the same kind, pro- 
duced by successive transformations of co-ordinates, shall 
be the same as if we had passed in one step, also of the 
same kind, from the initial set of co-ordinates to the final 
set. That this is in fact ensured is seen from the properties 


119 


X. 5 (ii) Reason for limitation 


of the partial derivative of a set. 
that 

A' k _ *x\ 
A* ~ 7>x a ’ 

and that 


Suppose, for example, 


( 1 ) 


AT _ *xf 

Then, by (VII. 2. 2) and (VIII. 3. l), 

A" a _ A" a A' x _ -bx"„ lx\ _ *x" a 
A a ~ A ' A A a ~ Dx\ 7>x a ~ t)x a ’ 
which is of the same form as (1) and (2). 


( 2 ) 


( 3 ) 


X. 5. Miscellaneous properties. — The following' are 
some miscellaneous properties which are useful in deter- 
mining whether a set is a tensor. 


(i) The sum (or difference) of two tensors of the same 
rank and character and the same suffix is a tensor. 

[Suppose, for instance, that A x and 5* are contravariant 
tensors. Then 


and therefore 


//A _ c \ ta T>/\ _ 1 •T \ pa 

~ *x n ’ - h* ’ 


(A' x + If x ) = !f^(A a + R a ).] 


(ii) The product of two tensors is a tensor whose character 
is the combination of the characters of the two. For example, 
from 


120 Miscellaneous properties X. 5 (iii) 

(iii) An inner product of two tensors, or an inner sum 
(§ 1 (iv)) of a tensor, taken with regard to suffixes of 
opposite character, is a tensor. 

[Take, for example, 

,p _ Zxg ixy Ix'p 5 

7sx r v ^ ■ 

If we replace p by v we have, by (VIII. 3. 3), 

lx y lx 'y/ -I 7 / -A 1 • 

t>x' i»x. a0 ~< 1 5 laB y LaB ~* ’ 

v n 

and therefore 

,V Zx^ y 

^ i>x\i)x\ 007 ’ 

which satisfies the requirements.] 

(iv) If in this last example we had replaced p by A, 
instead of p by v, the expression would have contained 

^ a 7>x b 
i>a:\ <>x' K ’ 

which has no general significance. 


(v) The derivative of a scalar (§ 3 (ii)) is a covariant 
vector. Suppose, e. g., that A is a scalar function of x K , 
whose value remains constant (§ 3 (ii)) for all transforma- 
tions. Then 

l A' _ _ Zx a DA 

ix\ ~ i>x\ ~ bx\ <)x a ’ 
so that ZA/tSx^ is a covariant vector. 


(vi) If the inner product of a set SL by each of m ( = 4) 

fcontravariant) . , , ~ , 

\ . [■ vectors is a tensor, then id is a tensor, and 

l covariant ) 


X. 5 (vi) Miscellaneous properties 


121 


( covariant 1 


. i uuvaiiaub t 

18 \ contravariant } as regards where f* is the linked 
suffix. 

[Take the case in which the vectors are contravariant, 
and suppose that SL^A^,,,. Let the m vectors be 
denoted by -5 1M , , or, collectively, by the A 

not indicating any tensor character. Then, by hypothesis, 

V 

regards so that 


is a tensor as regards and is a tensor as 


B XI3 = B'^, B' X>1 A' = ix / .. B Mi 4 

<>x' u i>x' v 


0y. 


From these we deduce 

B'^A 


^ J ''y 


whence, by division by (see VI. 9 (v)), 


_ 7)x e lx y 

— • w •••A 


nv. 


ttx'p <>x'. 


Py- 


Hence A t _ is a tensor and is covariant as regards x^. 

The case in which the vectors are covariant can be dealt 
with in the same way.] 


APPENDIX 

PRODUCT OF DETERMINANTS 


In IV. 5 we have taken as the standard form for product 
of two determinants — the order being now reduced from 3 
to 2, for economy of printing — 


a x b x 

X 

a \Pl 

= 

a \ a l + a 2^l 

^I a l + 

a 2 b 2 


a 2@2 


a i a 2 ^2^2 

^l a 2 ^2^2 


In this form, the element in the qth. column and rth row 
of the result is the ‘ inner product ’ of the jth column of 
the first determinant and the rth row of the second. By 
interchanges of columns and rows, and also by changing 
the order of multiplication, we get seven other forms, all 
constructed according to this rule. The eight forms can 
he set out as follows : 


a x b x 

X 

a l/*l 

= 

do ^2 


a 2$2 


a x b x 

X 

a l a 2 

= 

a 2 b 2 


fil P 2 


o x a 2 

X 

a l/ 8 r 

= 

b x b 2 


a 2 ) 3 2 


d x a 2 

X 

0. 1 0-2 

-= 

b x b 2 




a Al 

X 

a A 

= 



a 2 b 2 


a x a 2 

X 

a A 

= 

/ 3 x / 3 . 2 


a 2 b 2 


a i/ 8 i 

X 

a x a 0 

= 

o 2/8-2 


Kh 


°1 a 2 

X 

a x a 2 

- 

/ 8 1/82 


b x b. 



+a 2 l3 1 b x a x + b 2 f3 x 
a x a 2 + a 2 l3 2 b x a 2 + b 2 l3 2 
a x a x + a 2 a 2 b x a x +b 2 a 2 

a lfi X + a 2p2 fix + ^2 ^2 
+ a 2 a i d* b 2 )3 x 
a x a 2 + b x j3 2 a 2 a 2 + b 2 p 2 
& x O- x ‘\~b x 0. 2 ^2^1 "h ^2^2 
a x l3 x ‘\- b x ^i 2 a 2 fi x 4* b 2 (3 2 

a x a x + b x a 2 a x (3 x + b x i3 2 
a 2 a x + b 2 a 2 a 2 p x + b 2 l3 2 
a x a x + b x i3 x a x a 2 + b x p 2 
a 2 a x + b it i x a.,a 2 + b. 2 j3 2 
a x a x + a 2 a 2 a 1 i3 l + a 2 j3 2 
b x a x + b 2 a 2 b x fi x + b 2 fi 2 
a x a x +a 2 p x a x a 2 + a 2 fi 2 
b x a x +b 2 l3 x b x a 2 + b 2 p 2 


( 1 ) 

( 2 ) 

( 3 ) 

( 4 ) 

( 5 ) 

( 6 ) 
(U 
(8) 


Appendix 


123 


It will be seen that the last four are the transposed of the 
first four, hut in the reverse order ; i. e. the transposed of 
(1) (2) (3) (4) are (8) (7) (6) (5). 

In the double-suffix notation these become, the order 
of (5)-(8) being reversed : 


1 dqr 

1 x 

1 e qr 

1 = dq k e Xr \ 

(1) 

1 e rq 

1 d qr 

1 X 

\ e rq 

= 1 d q \ e r \ 1 

(2) 

1 e qr 

\ d rq 

1 X 

1 V 

1 = d X qe kr | 

(3) 

IS 

1 drq | 

X 1 

S 1 

= 1 d M e r A | 

(4) 

1 e qr 


X I 'Kq I — I d r \ e \ q I (8) 

X I d rq | = | d r \ e q\ ! (7) 
* I dqr | = | d kr e xq | (6) 
X I dqr | = I d kr e qX | (5) 


It must be borne in mind that in each of these statements 
q refers to the column and r to the row ; e. g. (6) 
that 


e n e n 

X 

^11 ^21 

- 

d\ i e w 

d\\ e X2 

e 2l C 22 


^12^22 


d\2 e \ ] 

d\2 e \2 


means 


INDEX OF SYMBOLS 


| d qr | determinant 39 

di ,s (in Chapter V) ratio of cofactor of d ps to | d qr | 42 

.4^ single set 44, 58 

A pp double set 45, 59 

App transposed of A pp 45, 59 

B V C V product-sum (inner product) of B p and C p 47, 62 

| p unit set 50, 64 

3( set generally 60 

333 product of at and 33 61 

Ap„B„p inner product of A pp and B pp 63 

A p ^ inverse of A pp 65 

AW reciprocal of A^ p 67 

(Af a )k inverse of partial set A ay 70 

[A^lm-k inverse of partial set A^ 70 

B p / A 1 * ratio of B p to A M 76 

33/2 ratio of 33 to 3 82 

i>33/S3 x derivative of 33 with regard to A K 85, 86 

(X q . X r ) mean product of deviations of X q and X r 94 

X A set conjugate to X K 97 

IV improved value of U 106 

Ap va inner sum derived from A p pva 116 

A* contravariant vector, A K covariant vector 117 


GENERAL INDEX 


Under any heading, separate entries are usually in the order of occurrence, 
not in alphabetical order. 


Addition of sets 61 
Adjoint determinant 37 ; cofactor 
of element of 38 
Auxiliaries 106 

Character of tensor 118 
Cofactor 24 ; expressed as deter- 
minant 25 

Cofaetors, determinant con- 
structed from 36-38 
Cogredience 79, 84 ; of statistical 
set and improved values 109 ; 
relation to contravariance 117 
Cogredient sets 79, 84 
Column of double set 15, 45 ; of 
determinant 21 
Complete inner product 62 
Conjugate set 97 ; determination 
of 98-99 ; original set as conju- 
gate of 98 ; of inner product 101 
Conjugate sets with linear rela- 
tions 100 

Continued inner product 68 ; 
transposed of 69 ; inverse of 69 ; 
reciprocal of 69 

Contragredience 79, 84 ; with 
linear relation 80 ; of variables 
and their partial differential 
operators 88 ; of conjugate sets 
101-102 ; relation to covariance 

117 

Contragredient sets 79, 84 
Contravariance, relation of, to 
cogredience 117 

Contravariant vector 117 ; tensor 

118 

Covariance, relation of, to contra- 
gredience 117 

Covariant vector 117 ; tensor 118 


Derivative of set 85 ; of sum or 
product of sets 86 ; of function 
of a set 87 

Determinant 21 ; notation 21, 23, 
39 ; elements 21 ; column 21 ; 
row 21 ; leading diagonal 21 ; 
leading term 21 ; order of 21 ; 
calculation of 32-33 (See also 
separate headings below) 

Determinant, construction of: — 
factors of terms 14-16 ; rule of 
signs 16-19 ; sign of term 
dependent on number of re- 
versals of order 17-18 ; sign of 
term changed by interchange 
of suffixes 18 ; effect of altering 
order of letters 18-19 ; final 
definition 20-21 

Determinant, main properties of, 
in ordinary notation 40 ; in 
tensor notation 52 

Determinant, properties of : — 0 if 
each element of column or row 
is 0 22 ; not altered by inter- 
change of columns and rows 22 ; 
sign changed by interchanging 
two columns [rows] 23 ; 0 if 
two columns [rows] identical 
23 ; expression in terms of 
elements of column [row] and 
their cofactors 24, 26, 28 ; sum 
of products of elements of 
column [row] by cofactors of 
parallel elements is 0 27, 28 ; 
multiplication by single factor 
31 ; not altered by increasing 
elements of column [row] by 
multiples of parallel elements 
31 


126 


General Index 


Determinants, sum of 81 ; pro- 
duct of, in ordinary notation 
33-36 ; product of, in tensor 
notation 48, 49 ; forms of pro- 
duct of 122 

Direct proportion 76, 84 
Dummy suffix 48, 62 

Elements of double set 15, 45, 59 ; 
of determinant 21 ; of single 
set 44, 58 

Error, reduction of 105 

Fitting 93 
Free suffix 4S, 62 
Frequency-quadratic 102 ; mean- 
ing of coefficients in 103 
Functional relation between sets 
73 

Gothic letters 60 
Graduation 93 

Greek letters for sets 43 seqq . ; for 
product-sums 47 

Improved values 106 ; criteria for 
105-110; determination of 110- 
114 

Independence, algebraical 73 n. ; 
statistical 96 

Independent observations, set 
constituted by 97, 110 
Inner multiplication 49 
Inner product 49, 62, 63; trans- 
posed of 63 ; rule for construc- 
tion of 64 

Inner product, complete 62 
Inner product, continued : see 
Continued inner product 
Inner sum 116 

Inverse of double set 65 ; con- 
dition for 65-66 ; of transposed 
set 67; original set as inverse 
of inverse set 67 

Jacobian 88 

Key set 20 

Leading diagonal of double set 
15 ; of determinant 21 


Leading term of double set 16 ; 

of determinant 21 
Linear function of single set 74 
Linear relation between single 
sets 73; between sets generally 
82 

Linked suffix 48, 62 

Matrix of determinant 59 n. 

Mean product of deviations, nota- 
tion for 94 ; algebraical laws 
satisfied by 94 

Mean-product set 96 ; of self- 
conjugate set, is unit set 97 
( See also separate headings below) 
Mean-product set, reciprocal of 
96 ; is mean -product set of 
conjugate set 99 

Mean-product set for partial varia- 
tion 104 

Minor 25 ; relation of, to co- 
factor 25-26 
Mixed tensor 118 
Moments, extension of method of 
109-110 

Multiplication of set by scalar 61 ; 
by set 61 

Notation : see Index ok Symbols 

Order of determinant 21 ; of single 
set 58 ; of double set 59 ; of set 
generally 60 
Outer product 49, 61 

Partial differential coefficient, 
reasonforappearanceofll5-116 
Partial double set 70 ; inverse or 
reciprocal of 70 
Partial single set 70 
Product, inner: see Inner product 
Product, outer 49, 61 
Product of determinant by single 
factor 31; of two determinants, 
see Determinants; of two sets 
45-46, 61 

Product-sum notation 47 
Proportion, direct 76, 81; reci- 
procal 77, 84 

Quadratic form, expressed as inner 
product of two single sets 81 ; 


General Index 


127 


derivative of 86 ; expressed as 
sum of squares 88-89 

Rank of set 60 

Ratio, as operator 76 ; of single 
sets 76 ; of sets generally 81, 
82 ; reciprocal of 81, 82 
Ratios, equal 76, 76, 82, 83 ; in- 
version of 76, 78 ; cross-multi- 
plication of 78 

Reciprocal determinant 41-42 
Reciprocal of double set 67 ; 
original set as reciprocal of 
reciprocal set 67 
Reciprocal proportion 77, 84 
Reciprocation 67 
Reduction of error 105 
Relativity and statistical theory, 
similarity of methods for 93 
Reversals of order 17 
Row of double set 15, 45 ; of 
determinant 21 

Scalar 46, 60; as invariant 118; 

derivative of 120 
Self-conjugate set 100 
Set, double 15, 45, 59 ; deter- 
minant of 59 ; transposed of 
59 ; inverse of 65 ; reciprocal 
of 67 

Set, single 44, 68 ; as sum of com- 
ponent single sets 71 
Set-notation 42 ; principles of 44 
Sets, functional relation between 
73 

Simultaneous equations: — method 
ofindividualsolutionll; formu- 
lae for particular cases 12-13, 
29 ; statement of general prob- 
lem 13-14 ; trial formula for 


denominator 14-19 ; general 
solution 30 

Statistical independence 96 
Statistical set, nature of 92-93 
Substitution-operator, unit set as 
64-65 

Successive summations, order im- 
material 48 
Suffixes, upper 47 
Sum of determinants 31 
Symbols : see Index of Symbols 
Symmetrical double set 45, 60; 
identity of inverse and reci- 
procal sets 67 

Tensor 115; conditions for 116- 
118; contravariant 118; co- 
variant 118 ; mixed 118; reason 
for limitation 118 
Tensors, properties of 119-121 ; 
sum or difference 119; product 
119; inner product or inner 
sum 120 

Transposed of determinant 22 ; of 
double set 59 ; of inner product 
63 ; of inverse set 67 ; of reci- 
procal set 67 

Unit set 51, 64 ; properties of 49- 
51, 64-65 ; symmetry of 51, 64 ; 
action as substitution-operator 
64-65 ; inner product of two 
unit sets 65 

Variable sets 72 

Variation, direct 79, 84 ; reci- 
procal 79, 84 

Vector, ordinary, relation tosingle 
set 46 n., 59 
Vector, tensor 117 


AN INTRODUCTION TO 

COMBINATORY ANALYSIS 


BY 

MAJOR P. A. MACMAHON, 
D.Sc., Sc.D., LL.D., F.R.S. 
Member of St John’s College, Cambridge 



PREFACE 


T HIS little book is intended to be an Introduction to the two 
volumes of Combinatory Analysis which were published by the 
Cambridge University Press in 1915 - 16 . It has appeared to me to be 
necessary from the circumstance that some of my mathematical critics 
have found that the presentation of the general problem through the 
medium of the algebra of symmetric functions is difficult or trouble- 
some reading. I was reminded that the great Euler wrote a famous 
algebra which was addressed to his man-servant, and had the object of 
anticipating and removing every conceivable difficulty and obscurity. 
Posterity gives the verdict that, in accomplishing this he was wonder- 
fully successful. 

From a general point of view it seems to me there is advantage on 
the one hand in explaining a complicated if not difficult matter to an 
untrained mind, and on the other in propounding a simple theory for 
the benefit of those who are highly trained. In this way certain 
vantage points may be reached which are not commonly attainable by 
the usual plan of addressing students in a style which is in proportion 
to their attainments. The advantage which has been spoken of accrues 
both to the writer and to the reader. The writer for example is likely 
to be led to points of view of whose existence he was previously un- 
aware or aware of only sub-consciously. In attempting what is here 
proposed it is inevitable that much must be written that will appear 
to the reader to be self-evident and unworthy of statement. The 
intention is by a succession of such statements to arrive at facts which, 
by a quicker progression, would be difficult or troublesome to grasp. 
It is in analogy with a succession of likenesses of a person taken at 
small intervals of time such that little or no difference can be detected 
between any two successive pictures but between pictures taken at 


PREFACE 


considerable intervals there is but a mere resemblance. The subject- 
matter of the book shews I believe that the algebra of symmetric 
functions and an important part of Combinatory Analysis are beautifully 
adapted to one another, and if 1 have succeeded in making that clear 
to the reader I shall be satisfied that the object of the book has been 
attained. 

My grateful thanks are due to Professor J. E. A. Steggall, M.A. 
for much helpful criticism during the composition of the hook. 

P. A. M. 

February , 1920 . 


TABLE OF CONTENTS 


CHAPTER I 

ELEMENTARY THEORY OF SYMMETRIC FUNCTIONS 

ART ’ PAGE 

1—3. Definitions. The Partition Notation. The Power-Sums . . 1 

4—5. The Elementary Function. Homogeneous Product-Sums . . 4 

6 — 8. Relations between the important series of functions ... 5 

9—10. Combination and Permutation of letters. Partitions and Com- 
positions of numbers 8 

11—13. Order of arrangement of combinations, permutations, partitions 

and compositions. Dictionary or Alphabetical Order . . 8 

CHAPTER II 

OPENING OF THE THEORY OF DISTRIBUTIONS 

14 — 15. Definite way of performing algebraical multiplication . .11 

16 — 20. Distribution of letters or objects into boxes. Specifications of 
objects and boxes. Multinomial Theorem. Distribution 

Function . . . j 2 

21—23. Examples of Distribution. Dual interpretation of Binomial 

Theorem .......... .15 

24—27. Interpretation of the product of two or more monomial sym- 
metric functions . . . . . . . .17 

28 — 29. The multiplication of symmetric functions. Derivation of for- 
mulae. The symbol of operation D m 22 

30—31. Operation of D m upon a product of functions. Connexion with 

the compositions of m . . . . . .25 

CHAPTER III 

DISTRIBUTION INTO DIFFERENT BOXES 

32—33. Determination of the enumerating function in the case of two 

boxes 27 

34—37. The general theory of any number of boxes. Operation of D m 
upon products of product-sums. Numerical methods and 

formula; 29 

38—39. Restriction upon the number of similar objects that may be 
placed in similar boxes. Operation of D m in this case 


33 


VI 


CONTENTS 


CHAPTER IV 


DISTRIBUTION WHEN OBJECTS AND BOXES ARE EQUAL 
IN NUMBER 

A11T. PAGE 

40 — 42. Solution by means of product-sums. Interchange of Specifica- 
tion of Objects and Boxes. Theorem of symmetry in the 
algebra of product-sums. Employment of the symbol D m . 36 

43 47. Pairing of objects of two different sets of objects. Specification 

of a distribution. Restriction upon the number of similar 
objects that can be placed in similar boxes. The operation 

of D m 38 

48 — 49. Enumeration of rectangular diagrams involving compositions of 

numbers 42 

50 — 51. Equivalences of certain distributions ...... 44 


CHAPTER V 

DISTRIBUTIONS OF GIVEN SPECIFICATION 

52 — 58. New functions which put the specification of a distribution in 
evidence. Proof of symmetry in the functions. Separation 
of a function or of a partition. Solution of the problem of 
enumeration. Operation of D m upon the new functions. An 

example of enumeration 46 

59 — 61. Correspondence with numbered diagrams 52 

CHAPTER VI 

THE MOST GENERAL CASE OF DISTRIBUTION 

62 — 74. Distribution when the boxes are identical. Multipartite numbers 
and their partitions. Distribution into similar boxes identified 
with the partitions of multipartite numbers. Solution of the 
problem by means of product-sums of certain combinations. 
Application of symbol D m . Simple particular cases . . 56 

75 — 77. The most general case of distribution. Application to the dis- 
tribution of identical objects. Elegant theorem of distribution 
which depends upon conjugate partitions. Some particular 
examples and verifications ....... 65 

78 — 81. Certain restricted distributions . 67 


CHAPTER I 


ELEMENTARY THEORY OF SYMMETRIC FUNCTIONS 

1. A great part of Combinatory Analysis may be based upon the 
algebra of Symmetric Functions, and it is therefore necessary to 
have some clear definitions and simple properties of such functions 
before us. 

An algebraic function of a number of numerical magnitudes is said 
to be Symmetrical if it be unaltered when any two of the magnitudes 
are interchanged. In algebra such magnitudes (or quantities) are 
denoted by letters of the alphabet. 

Restricting ourselves to those functions which are rational it is 
clear, for example, that the simple sum of the quantities “> ft y, ••• U 
n in number, is such a function. For the sum 

a + ^3 -t- y 4- ... 4-v 

is unaltered when any selected pair of the letters is interchanged. 
For this symmetric function, of which a is the type, we adopt the short- 
hand 

2a. 

Again, another symmetric function is 

a‘ + p‘ + y+ + 

because the enunciated conditions of symmetry are just as clearly 
satisfied as in the particular case i= 1. 

We may denote this function by 

2a‘, 

the representative or typical term being alone put in evidence. This 
last expression includes all the integral symmetric functions, the repre- 
sentative term of which involves one only of the quantities. If we are 
not restricted to integral functions the representative term may be any 
rational function of a. For example 

s a a a* ft y‘ V s 

1 - aa‘ 1 — aa' 1 — aft 1 1 - ay' + + 1 — av" 

but we are, in most cases, concerned with the symmetric functions which 
are integral as well as rational. 


2 


ELEMENTARY THEORY OF SYMMETRIC FUNCTIONS 


The function 2a' is the sum of the «th powers of the quantities. 
It takes a leading part in the algebra of the functions. 

The laws of this algebra do not depend upon the absolute magni- 
tudes of the quantities a, /3, y, ... v, so that usually it is not necessary 
to specify these quantities. Various notations have been adopted with 
the object of eliminating the actual magnitudes from consideration. 
Thus 2a' is sometimes denoted by s t ; meaning thereby the sum of the 
*th powers of magnitudes which it is not needful to specify either in 
magnitude or (very often) in number. Others realising that in the 
algebra they have to deal entirely with the number i have denoted the 
same function by 

( 0 , 

viz. the number i in round brackets. This notation is of the greater 
importance because, as will become evident, it can be extended readily to 
rational and integral functions in general. Not only so; it is funda- 
mentally important because it supplies the connecting link between the 
algebra of symmetric functions and theories which deal with numbers 
only and not with algebraic quantities. 

2. Proceeding to functions whose representative terms involve two 
quantities, the simplest we find to be 

a/3 + ay + fly + . . . + /uv, 

which involves each of the \n (n - 1) combinations, two together, of the 
n quantities. It is visibly symmetrical. 

This is denoted in conformity with the conventional notation by 

2a/3, 

or by (11), 

the function being completely given wdien n is known. 

Every function is considered to have a weight, which is equal to 
the sum of the numbers that, in the last notation, appear in the 
brackets. 

Thus the functions (i), (11) have the weights i, 2 respectively. 
When a number is repeated in brackets it is convenient to use 
repetitional exponents. Thus 

(11) is frequently written in the form (l 2 ). 

Of the weight one we have the single function 

(i); 

of the weight two, the two functions 

(2), (l 2 )- 


ELEMENTARY THEORY OF SYMMETRIC FUNCTIONS 3 

Observe that two functions present themselves because two objects 
can either be taken in one lot comprising both objects, or in two lots, 
one object in each lot. We express this by saying that the number 2 
has two partitions. We have thus, of the weight two, a function corre- 
sponding to each partition of 2. 

3. In the notation of the Theory of the Partition of Numbers the 
partitions of the number 2 are denoted by (2), (l 2 ). It is for this 
reason that the notation we are employing for symmetric functions is 
termed ‘The Partition Notation.’ Similarly in correspondence with 
the three partitions of 3, viz. (3), (21), (l 3 ), we have the symmetric 
functions 

2a 3 , 2a' 2 /3, 2a/3y 

of the weight 3. 

Of symmetric functions whose representative terms involve two of 
the n quantities we have the two types in which the repetitional 
exponents are alike, or different, 

2a' y 3 i = a‘/3* + ay + /?/ + . . . + /xV = (i 2 ), 

2a*/P = afp +a’/?‘+ ... + fxV + /Jv* = (y ), 

involving \n (n — 1) and n (n — 1) terms respectively. 

It is now an easy step to the function 

2a 1 *ia 2 '*a 3 i 3 . . . a s *s, 

wherein we have replaced the quantities a, /3, y, ... v by the suffixed 
series a 2 , a 3 , ... a s . 

In the partition notation we write the function 

(hhH ■ ■ • h)t 

where of course s cannot be greater than n. 

It involves a number of terms which can be computed when we 
know the equalities that occur between the numbers i. 2 , i 3 , ... i s . 

If we are thinking only of numbers, (iji,*, is a partition of a 
number N= h + i* + * 3 + . . . + i„ and since a partition of N is defined 
to be any collection of positive integers whose sum is N we may consider 
the numbers i u i a , i 3 , ... i, to be in descending order of magnitude. 
These numbers are called the Parts of the partition and the partition is 
said to have s parts. 

The series of functions denoted by (i) for different integer values of 
i constitute a first important set. They are sometimes called one-part 
functions. 


4 


ELEMENTARY THEORY OF SYMMETRIC FUNCTIONS 


4. A second important set is constituted by those functions which 
are denoted by partitions in which only unity appears as a part. It is 
(1), (I s ), (I s ), (1"), 

Or 2a,, 2a,a 2 , 20,0203, ... 20,0303 ... a„. 

These are sometimes called unitary functions. 

The set is particularly connected with the Theory of Algebraic 
Equations because 

(x — a) (x - /?) {x - y) ... (x - v ) 

= x n — 2a . x n ~ 1 + 2a/3 . x n ~ 3 — 2a/?y . x n ~ 3 + ..., 
the last term being + 2a/3 y...v, according as n is even or uneven. 
Hence considering the equation 

x * - a, a;"' 1 + a i x n ~ i - a 3 x n ~ 3 + . . . + (-)" a n = 0, 
and supposing the n roots to be 

P, y, ■■■ v , 

it is clear that 

x n - a } x n ~ 3 + a 2 x n ~‘‘ - ... +(-)'*«„ 

= x n - 2a . x n ~ l + 2a/3 . X n ~ 3 — . . . + (-) n a^3y ... v, 
and we at once deduce the relations 

a, = 2a, 

a.j = 2 a/?, 
a 3 = 2a/?y, 


a n =o./ 3 y ... v. 

These functions are frequently called ‘ elementary ’ symmetric functions 
because they arise in this simple manner. 

It is sometimes convenient, undoubtedly, to regard the quantities 
a > 0, y, ... v as being the roots of an equation, the left-hand side of 
which involves the elementary functions with alternately positive and 
negative signs, but the notion is not essential to the study of the 
subject of symmetric functions. 

5. There is a third important series of functions. 

Of the weight w there are functions which in the partition notation 
are denoted by partitions of the number w. 

There is one function corresponding to every such partition. 

Such a function, since it is denoted by a single partition, is called a 
Monomial Symmetric Function. 


ELEMENTARY THEORY OF SYMMETRIC FUNCTIONS 


5 


If we add all such functions which have the same weight w'e obtain, 
algebraically speaking, all the products w together of the quantities 
a , /?, y, ... v, repetitions permissible. 

Such a sum is called the Homogeneous Product-Sum of weight w of 
the n quantities. 

It is usually denoted by h w . 

We have 

/q = (l) = 2a, 

^ = 00 + (l ! )=2a 2 +2«A 

h 3 = (3) + (21) 4 (I s ) = 2a 3 + 2 a 2 /3 + 2a/Jy, 

and so forth. 

W e have before us the three sets of functions 

®i, *«. #j) ■ ■ *►, 

^3 > • • • #|/> 
j ^2} ^3 ) • • • h v , .... 

The first and third sets contain an infinite number of members, but 
the second set only involves n members where n is the number of the 
quantities a, /3, y, 


6. The identity of Art. 4 which connects the functions a u a.,, a a , ... 
with a, /3, y, ... may be written, by putting i for x, 

1 ~a,y + a. 2 f- ... + (_)» a „y" s (1 -ay) (1 - faj) ... (1 - V y), 

or in the form 

1 _ 1 

1 “ «,y + - • ■ • + {-)' l a,y ~ ... (l-r ?/ )' 

If we expand the last fraction in ascending powers of y, we obtain, in 
the first place, 

1 

+ (a+/3 + y+ + V s ) y 

+ (a 2 + p 1 + y 2 + . . . + + a /3 + ay + /3y + . . . + fiv) 1 f 

+ (a :l + /3 3 + y 3 + ... + v , + a-/3 + a/? 2 + ... + M V + / xw + a i 8y + a i 3S + ... + A^) 

+ 


It is clear that the coefficient of y m is the homogeneous product-sum 
of weight n\ so that we may write 

r^T 1 :;- ■ 1 • h --n - 

an identity. 


6 


ELEMENTARY THEORY OF SYMMETRIC FUNCTIONS 


Thence we obtain 

+ +h l y + L_y‘ > + ... + h w y w + ...) = 1. 

Since this is an identity we may multiply out the left-hand side and 
equate the coefficients of the successive powers of y to zero ; obtaining 

h 1 — a x = 0, 

h 2 — a i h l + a 2 = 0, 

h 3 — a l h l + aJh - a 3 = 0, 

h n - a, A n _! + a 2 h n . 2 - ... + ( -) n a n = 0, 
h n+1 — a,h H + a-ihn-i - ... + (-) n a„Ai = 0, 
hn + 2 Hi h n +\ + d 2 h n ... + ( ) a n h 2 — 0, 

relations which enable us to express any function h,„ in terms of 
members of the series a u « 2 , a 3 , ... a n . 

7. In the applications to combinatory analysis it usually happens that 
we may regard n as being indefinitely great and then the relations are 
simply 

h 1 — a i = 0, 

hi - a i h 1 + a 2 = 0, 

h 3 - af 2 + a 2 h 1 — a 3 = 0, 

continued indefinitely. 

The before-written identity now becomes 
(1 - a 3 y + a 2 y"--a 3 y 3 + ... ad inf.) (1 + h t y + h 2 y' i + h 3 y 3 + ...ad inf.) = 1, 
and herein writing —y for y and transposing the factors we find 
(1 - h,y + h 2 y 2 - h 3 y s + ... ad inf.) (1 + a 3 y + a 2 y‘ + a 3 if + ... ad inf.) = 1, 

an identity which is derivable from the former by interchange of the 
symbols a and h. 

There is thus perfect symmetry between the symbols and it follows 
as a matter of course that in any relation connecting the quantities 
a,, a 2) a 3 , ... with the quantities h u L, h 3 , ... we are at liberty to inter- 
change the symbols a, h. This interesting fact can be at once verified in 
the case of the relations h 3 - a 3 = 0, etc. 

Solving these equations we find 

h, = «,, 
h 3 = a? - n 2 , 
h 3 =a 3 — 2 a 2 a 2 + a 3 , 


ELEMENTARY THEORY OF SYMMETRIC FUNCTIONS 


7 


and as shewn in works upon algebra 

h„ = 2 (-)"+».+*■.+...+»* (; i + 7r 2 + +^t) ! a n, a ,, _ 

ir 1 l tt 2 \ ... w k \ 

where 7r, ! denotes the factorial of ir 3 and 

IT, + 2 7T 2 + 3ir s + . . . + = w, 

the summation being taken for all sets of positive integers ir 2 , ... v k 
which satisfy this equation. 

By interchange of symbols we pass to the relations 

<h — hi, 

~ h\ h*. , 

a 3 = h{'~ ‘2hih 2 + h 3 , 


a n = 2 (— )*+»!+»■•+•■•+»» 


(T l + TTg + ■ . . + TT k ) ! 


, 1 7T„ ! . 


h{’' h 2 * . . . h k w ><. 


8. It is shewn in works upon algebra that the relations between the 
symbols s,, s 2> s 3 , ... and the symbols a u a 2 , a 3 , ... are 


Si = a u 

s 2 = ai - 2 a 2 , 

s 3 — (X\ 3(?i® 2 H" 3u^jj 


«» = 2 (-)»+'.+^+-+t* 

«l = «l, 


(wi + TT 2 + ... + TT k - 1)1 n 
— — - , «,'■ a/ 2 • • ■ a*"*. 

7T, ! ir 2 ! . . . 7Tj ! 


2 ! a 2 = *1 2 - *2, 

3 ! a 3 = s, 3 - 3.s'i s 2 + 2s< 


»!<*„ = 2 (—)»+».+»•+■■ 


I"” 1 . 2 Tra . . . . 7T-J ! 7T 2 ! . . . 7T* ! 

also between the symbols s 2 , ,s' :l , ... and A,, A 2 , A :J , ... 

Sl = , 


s 1 ’ r -s 2 ’ r >... «*»*■ 


«> = - (^i 2 - 2A 2 ), 

S3 “ A 1 3Ai h 2 + 3h 3 , 


»»= 2 (-)».+».+-+n+i ( T » + ,r2 + ••• 

7T, ! ir 2 ! 


+ ’ r t — 1) ! n 

... TT k \ 


hi”' A 2 "’! . . . h k w i>. 


8 


ELEMENTARY THEORY OF SYMMETRIC FUNCTIONS 


h i - *i > 

2 ! h 2 = s t 2 + s s> 

3 ! A 3 = s* + 3siSa + 2s 3 , 


n'.h n =1 


l’ r >.2’ r * ... £’*. 7T, !tt 2 ! 


*!"■> *„"■! . . . Sj/t. 


These are the principal properties of symmetric functions that will be 
of use. 


9. If we take any assemblage of letters such as aa/3y and are not 
concerned with the order in w r hich these letters are written, we have a 
‘ Combination ’ of the letters. If however the order in which the letters 
are written be taken into account, we have a ‘ Permutation ’ of the 
letters. In the present case we have twelve permutations, viz. 

aafly aay/3 afiay afiya ayafi ayfia 

fiaay fiay a fiyaa yaafi yafta yfiaa 

10. In a similar manner if we take any collection of integers which 
add up to a given integer we have as above defined (Art. 3) a partition 
of the given number; here no account is taken of the order in which 
the parts of the partition may be written ; but if order has to be taken 
into account each way of writing the parts is called a ‘ Composition ’ of 
the number, such composition appertaining to the particular partition 
which is involved. Thus of the number 9, 3321 s 3 2 21 is a partition 
which gives rise to the twelve compositions : 

3321 3312 3231 3213 3132 3123 

2331 2313 2133 1332 1323 1233 

and it will be noticed that the compositions which appertain to the 
partition 3321 of the number 9 are in correspondence with the permu- 
tations of the combination ao/3y. 

Moreover, in general the compositions which appertain to any given 
partition of a number are in correspondence with the permutations of a 
certain combination of letters. 

IT Ti pursuing the main object of this book, namely the study of 
the algebra of symmetric functions together with those theories of com- 
bination, permutation, arrangement, order and distribution which are 
summed up in the title ‘ Combinatory Analysis,’ it is important to bave 
some specific rules for arranging the order in which the terms of algebraic 
expressions are written down. 


ELEMENTARY THEORY OF SYMMETRIC FUNCTIONS 


9 


A monomial symmetric function, as defined in Art. 5, is the sum of 
a number of different combinations of the same type. In writing out at 
length these combinations of quantities a, j5, ... r we may adopt what 
is called the ‘ dictionary ’ (or alternatively ‘ alphabetical ’) order. 

In any particular combination we write the a’s first, then the /3’s, 
and so forth ; also one combination is given priority of another if, 
considering the two combinations to be words, the dictionary would 
give the one word before the other. 

This allusion to the dictionary, with which all persons are familiar, 
seems to define shortly and clearly the principle of order usually 
adopted. Thus we write the monomial function of four quantities a, /?, y, 8 

2a‘ 2 /3y 

in the order a 2 /3y + a 2 /38 + a 2 y8 + a/Fy + a/3 2 8 + a/3y 2 
+ a/3S 2 + ay 2 8 + ay8 2 + /Fy8 + /?y 2 8 + /3y8 2 , 

the dictionary order being in evidence both in each combination and in 
the order of the combinations. 

Another order is sometimes useful. We may have, in each com- 
bination, the repetitional numbers always in the same order as they 
appear in the representative combination but subject to this rule , the 
letters in dictionary order. 

The combinations thus written would then be arranged in dictionary 
order. Thus we might write 

2a 2 /3y 

= a 2 /3y + a 2 /38 + a 2 yS + /Fay + /FaS + /FyS 
+ y 2 a/J + y 2 a8 + y 2 /88 + 8 2 a/3 + 8 2 ay + 8 2 /Jy. 

12. Again, frequently we have to write out at length the permu- 
tations of a given combination of letters. Here again it is usual to 
adopt the dictionary order, each permutation being regarded as a 
dictionary word. Thus the twelve permutations of aafjy are written 

aafiy aayj a jay a(3ya ayaft ayfia 

jaay fiaya /Jyaa yaa/3 yafUa y/?aa. 

13. When we have to write out symmetric functions, of the same 
weight, expressed in partition notation, we usually adopt numerical 
order, the meaning of which will be clear from the example 

k 

= (6) + (5 1) + (42) + (4 1 1 ) + (33) + (321) + (31 1 1) 

+ ( 222 ) + ( 2211 ) + (21 111 ) + (111 111 ), 


10 


ELEMENTARY THEORY OF SYMMETRIC FUNCTIONS 


where in each term the largest number available is written first, the 
next largest second, and so on ; and in ordering the partitions numbers 
in descending order of magnitude are in the same relation as are the 
successive letters of the alphabet in dictionary order. The alternative 
method is to adopt the numerical order subject to the rule that the 
partitions are to be arranged in ascending order in regard to the number 
of parts involved. 

This would give the expression 

= ( 6 ) + ( 51 ) + ( 42 ) + ( 33 ) + ( 411 ) + ( 321 ) + ( 222 ) 

+ ( 31 11 ) + ( 2211 ) + (21 111 ) + ( 11 1111 ). 

When we have to write out the compositions associated with a given 
partition we adopt numerical order. 

Thus associated with the partition ( 321 ) of the number 6 we have 
the compositions 

( 321 ), ( 312 ), ( 231 ), ( 213 ), ( 132 ), ( 123 ). 


CHAPTER II 


OPENING OF THE THEORY OF DISTRIBUTIONS 

14. From the principles set forth in the concluding articles of 
Chapter i we can realise a definite way of expressing the result of 
algebraical multiplication. 

Suppose that we have to form the product of a number of alge- 
braical expressions each of which involves (say) three terms. The 
expressions are supposed to be given in a definite order from left to 
right. This order will be determined, usually, by the circumstances. 

Let the factors be n in number, and, in the given definite order, 
denoted by 

(a, + b, + c,) (a 2 + b 2 + c 2 ) (a 3 + b 3 + c 3 ) ... 

(«n - 1 + K - 1 + c„_,) (a„ + b n + c„), 

where three terms are involved in each factor merely for the sake of 
simplicity. 

To obtain a term of the product we select a term (any term) from 
each factor and place them in contact in the order in which they have 
been selected; the factors being dealt with in order from left to right. 
The term of the product, thus reached, may involve one, two or three 
of the symbols a, b, c according to the way that the selection is carried 
out. To place the terms (3" in number) thus arrived at in a definite 
order we make our selections in such wise that the terms produced are 
in dictionary order. Thus the first three terms will be 

d\d.ld% ■ ■ . Ct n - 1 d n , 
di&zCia ... a n . 1 b n , 
aid 2 a 3 ... 

and the last three 

CiC. 2 c 3 ... c n _ x a n , 

Cl c 2 c 3 ... C„_A, 

Ci C 2 C-.i ••• C n ~\C n . 


15. As an example, consider the development of (a + /?)’*. 
Writing down the n factors 

(“ + P) (° + /*)(“ + P) ■■■ («+ 0) (a + /3), 


12 


OPENING OF THE THEORY OF DISTRIBUTIONS 


the multiplication, according to rule, gives for a few terms 

a n + a n-i0 + a n_2 /3a + a n ~ 2 pp + a n - 3 /3aa 

+ a n - 3 /3a/3 + a n ~ 3 /3/3a + a n ~ 3 ppp + . . . , 
and the complete product for n = 4 is 

a 4 + a 3 /3 + a 2 Pa + a 2 /? 2 + a/3a 2 + a/3a/3 + ap?a + a/3 3 

+ Pa 3 + 0a 2 /3 + /3a/3a + PapP + ppaa + PpaP + /3/3/3a + /3 4 . 

The combinations which appear are 

a 4 ) a 3 /3, a 2 /3 2 , a/3 3 , /3 4 , 

and the product, as obtained by rule, involves all the permutations of 
these combinations, and no other terms. 

The terms of the product are visibly in dictionary order and from 
the way in which the multiplication has been carried out each of the 
combinations necessarily appears as many times as it possesses permu- 
tations ; so that when the terms are assembled so as to yield the 
formula of the binomial theorem 

(a + py = a 4 + 4a : '/3 + 6a 2 /3 2 + 4a/3 3 + P\ 
each numerical coefficient denotes the number of permutations of the 
combination of letters to which it is attached. The same remark can be 
made in regard to the general formula 

(a + /3)’> = a’> + (”)a’‘- I /3+ g) a"- 2 /3 2 + •■• + (”) a P*~' + P"- 

16. We proceed to another order of ideas by connecting the theory 
above sketched with a Distribution into different Boxes. 

Suppose that we have four different (that is distinguishable) boxes 
A , , A ,, , A s , A 4 arranged in order from left to right 
A, A 2 A 3 A t 

a a P p 

a P a p 

a P P a 

P P a p 

Papa 
P a a a 

and let us consider the selections of factor terms that were made in 
forming the product combination a' 2 /3 2 . 

For the first selection we took the terms a, a, /3, p from the first, 
second, third and fourth factors respectively. Place these letters in the 
four boxes ^4,, A-,, A 3 , A x respectively. Proceed in the same way for 


OPENING OF THE THEORY OF DISTRIBUTIONS 


13 


each of the six selections that produce the combination a 2 /? 2 . We 
obtain the successive lines of letters shewn in the above scheme. We 
observe that we have distributed the four letters u, a, /3, /3 into the four 
different boxes, one letter into each box, in every possible way, and that 
reading the lines of letters from left to right one such distribution 
corresponds to each permutation of the combination. From the mode 
of term selection, to form the product, each permutation must occur 
and there can be no other distributions into the boxes except those which 
correspond to the permutations. 

We thus see that, if the binomial expression 

(a + /?)’* 

be expanded, the coefficient of the term a yi [i“ '”* is equal 

(i) to the number of permutations of the combination a"*/?''-"', 

(ii) to the number of ways in which the letters of the combination 
a mpn-m can \ )e distributed into n different boxes, one letter into 
each box. 

17. By precisely the same argument we reach the conclusion that if 
the multinomial expression 

(a, + a, + a :l + ... + a,,) 1 ' 

be expanded, the coefficient of the term 

a/'O** a 4 h ... o,b 

is equal 

(i) to the number of permutations of the combination . . . a/*, 

(ii) to the number of ways in which the letters of the combination 
“lW" <*.'• can be distributed into i different boxes, so that 
each box contains one letter. 

We now remark that since a, + a., + a 3 + . . . + a„ is a symmetric function 
of the n quantities, the expression 

(a, + a, + a ;1 + . . . + a„) ,: 

is also a symmetric function. Hence every term of the expansion which 
is also a term of the function 

Sa/'a.ha./s ... a,h = Q 

must appear with the same coefficient. 

Hence we may say that when 

(a, + a, + a 3 + . . . + a,,) 1 ' 

is expanded the coefficient of symmetric function 

2a/ 1 . . . a/s 

is equal 


14 


OPENING OF THE THEORY OF DISTRIBUTIONS 


(i) to the number of permutations of the combination a, ! i aj* . . . a s *«, 

(ii) to the number of ways in which the letters of the combination 
<■<2 ... a/» can be distributed into i different boxes, one 
letter into each box. 

18 . We have spoken of the distribution of letters into boxes. The 
letters may represent objects or things and it is often more convenient 
to speak of the distribution of objects rather than of letters. The 
objects are sufficiently specified by the repetitional numbers i,, i. 2 , ... i s . 
We can therefore properly define the objects distributed or permuted 
by saying that they have a Specification 

(*1 *2 *'.) 

which is necessarily some partition of i, the whole number of objects. 

Also the number of ways in which the distribution into boxes can take 
place depends upon the identities that may exist between them. The 
boxes being such that there are no identities — in fact representable by 
-d-i, A 2t Aj, 

we have only to regard the repetitional numbers, which in this case 
consist of i units. The Box Specification is therefore 

( 10 . 

and the distribution which we have had under view may be described as 
of objects specified by ... i,) into boxes specified by (1*), where 

h + *a + i, + ■ ■ ■ + is = i. 

19 . The actual number of permutations or distributions is readily 
obtained. If the objects be all different, or in other words of specifica- 
tion (P), we may select an object for box At in i ways; from the i- 1 
remaining objects we can select our object for box A 2 in i - 1 ways ; con- 
sequently we can use the boxes A u A 2 in i (i - 1) ways; similarly we 
can use the boxes A u A t , A 3 in i (i - 1 ) (e — 2) ways, and the whole of 
the boxes in i {i - 1) (t - 2) ... 2 . 1 or in i\ ways. Hence there are 
i\ ways of distributing objects of specification (P) into boxes of speci- 
fication (P). Now suppose «j of the objects are identical. In any 
distribution certain boxes ij in number will contain the same objects 
and if we now replace these similar objects by different objects we find 
that we can do this in jj! ways corresponding to ij permutations of the 
i\ different objects. Hence the former distributions are ?j ! times as 
numerous as the latter, and therefore we find that the latter distri- 
butions can take place in 

i ! 

r< wa y s - 

l i ■ 


OPENING OF THE THEORY OF DISTRIBUTIONS 15 


Similarly if other i 2 objects be identical the number of distributions is 



and finally if the specification of the objects be 

(*i ■ i$) 

the number of distributions into boxes of specification (l 1 ) is 

i\ 

®’i !*«!■■• i , ' ' 

This number therefore enumerates the permutations of objects of speci- 
fication (/i», ... i a ). 


20. In the partition notation the multinomial theorem may be 
written 


the summation being for every partition of the number i. 

It will be observed that the multinomial theorem involves the 
enumeration of the permutations of all combinations of letters that it is 
possible to form. For this reason it is often said to be the 1 Generating 
Function’ for the enumeration of permutations. Since it also enumerates 
certain distributions it may be said to be the ‘ Distribution Function ’ 
for the distribution of objects into boxes of specification (D), one object 
being placed in each box. 


21. From this first very elementary case of distribution we can at 
once derive an interesting corollary. 

Suppose that we have to distribute i objects into i +j different 
boxes, so that the box specification is (l' + -*) subject to the condition 
that no box is to contain more than one object. It is clear that in any 
distribution there must be j empty boxes, and that we may place in 
each of them one of j new and identical objects. 

These j new objects have the specification ( j ). Hence the problem 
before us is transformed into that of distributing objects of specification 

(*'x *a i,j) 

into boxes of specification (l i+J ). 

The objects and boxes being now equi-numerous we have the case 
already considered and can see that the number of distributions is 

(W)' 

?, ! i .,\ ... T, \j\' 


16 


OPENING OF THE THEORY OF DISTRIBUTIONS 


22. The reader will now observe that we can also pass from the 
latter to the former distribution, and that just as we can add any 
number to the specification of the objects in order to equalise the 
objects and boxes ; so conversely, if we are given any specification of 
i objects and boxes of specification (1*) we can cancel any part from 
the specification of the objects without altering the number of the 
distributions. Thus the distributions of objects (i,i 2 . . . i s ) into boxes (1‘) 
are equi-numerous with the distributions of objects 

(f 1*2 b-lfr + 1 b) 

into boxes (l j ). Here the number i r is cancelled from the object speci- 
fication, and r may be any of the numbers 1, 2, ... s. 

As a simple example we find that objects of the specifications 
(21 2 ), (l 2 ), (21) have equi-numerous distributions into four different 
boxes, not more than one object in each box. These are 


A\ A 2 A 3 A ^ 

-4i 

A 2 A 3 A 4 

a a p y 

p 

a a y 

a a y ft 

p 

a y a 

a P a y 

p 

y a a 

a p y a 

y 

a a P 

ay a /I 

y 

a P a 

a y f 3 a 

y 

P a a 

A\ A 2 A 3 A 4 

A , 

A 2 A z A 4 

a P 

0 

a 

a p 

P 

a 

a p 

P 

a 

a P 


0 « 

a P 


P a 

a p 


p a 

Aj A 2 A 3 A 4 

A , 

A 2 A 3 A 4 

a a ft 

P 

a a 

a a P 


a a P 

a P a 

P 

a a 

a a p 


a p a 

a P a 

P 

a a 

a P a 


P a a 


OPENING OF THE THEORY OF DISTRIBUTIONS 


17 


23. In the case of the binomial theorem another interpretation may 
be given to the coefficients. 

Writing 



it has been shewn that the coefficient enumerates the permutations 
of the combination 

We can shew that the same number enumerates the number of ways 
of selecting m letters from an assemblage of n different letters. For 
consider the combination a 3 /8 2 and its ten permutations 

aaafifi aa/3a/3 aafifia a/3aa/3 a/Ja/?a 

a/3/3aa /?aaa/J ftaafia /3a/3a a /3/?aaa 

When an a is in the sth place from the left of the permutation sub- 
stitute for it the suffixed a, a s ; thus obtaining, omitting the letters /? 
entirely, the ten combinations 

a i a 2 a 3 «i«5a 4 a i<» 2 a 5 a,a 3 a 4 aiO,a s 

a i a 4 a 5 a 2 a .i a 4 a -2 r H a 2 a 4 a ; a 3 a 4 a 5 

which constitute the ten combinations three together that can be 
formed from the letters of the combination a,a„a ;1 a 4 a,. If W e had 
operated similarly with the letters /J we would have reached the ten 
combinations two together that can be formed from the same combina- 
tion of five letters, viz. 

a 4 a s “3 a 5 a 3 a 4 a 2 a 5 a 2 a 4 

a 2 a 3 a,a 5 a,a 4 a,a 3 a, 

and we have no difficulty in realising that the number ( n ) = ( n ) 

\mj \n - m) 

enumerates the combinations m or n — m together that can be formed 
from n different letters. 

24. So far we have been concerned mainly with a distribution into 
different boxes, one object only being placed in each box. The results 
have been trivial, but they have supplied a connecting link between 
combinatory analysis and the algebra of symmetric functions. It will 
be shewn in what follows that this relationship can be greatly extended 
to the mutual advantage of combinatory analysis and the symmetrical 
algebra. 

We proceed in the first place by removing the restriction that each 
box is to contain only one object. We consider distributions in which 


18 


OPENING OF THE THEORY OF DISTRIBUTIONS 


the number of boxes is less than the number of objects, so that some 
box or boxes must contain more than one object. We may join the 
issue in two ways. We may precisely define the distribution and then 
seek its connexion with the algebra ; or we may set forth some com- 
bination of symmetric functions, which we can see will lead to a 
distribution of the required kind, and then seek to define the corre- 
sponding distribution. For the present we adopt the latter procedure 
and inquire into the development of the function 

(17 s 

which for four quantities may be written 

(a /3 + ay + a8 + / 3 y + /38 + yS)\ 

We carry out the multiplication of the i factors according to the 
process explained in Art. 14 . The complete product is clearly a sym- 
metric function, expressible as a linear function of monomials, of the 
weight 2 i, of the form 

2a,'ia.> ... a/» = (*,», ... 

The monomial function just written will appear with a certain co- 
efficient. What is the meaning of that coefficient in the theory of 
distributions 1 

In the process of multiplication we take any combination of two 
letters from the first factor with any combination from the second, and 
so on, until finally we take any combination from the «th factor and, 
assembling the letters thus obtained, we obtain, we suppose, a com- 
bination of letters 

®i ,l a,*«a,’> a>. 

Associated with this step in the multiplication we take i different 
boxes, that is to say of specification (l 1 ), 

A i A 2 A 3 ... A. i 

corresponding to the i factors of the product in order from left to right 
and place the two-letter combinations which have been selected from 
the 1st, 2nd, ... ith factors in the boxes A lt A- 2t ... A t respectively. 
We thus arrive at a distribution of the letters in ... a s '« into 

i different boxes, subject to the sole condition that each box is to 
contain two different letters ; or, as we may say, letters of specification 
( 1 2 ). Making a similar distribution in correspondence with every 
selective step in the multiplication, which produces the combination 
a,'i a^* ... a/s, we reach a set of distributions which constitute the whole 
number of ways of distributing a definite set of objects of specification 
(i, i 2 ... t',) into boxes of specification (1‘) in such wise that every box 


OPENING OF THE THEORY OF DISTRIBUTIONS 


19 


contains objects of specification (l 2 ). Since all the terms included in 
Soj'i a 2 *a ... a,'« , regarded as denoting objects, have the same specification 
we say that the number of the distributions above defined is equal to 
the coefficient of symmetric function 

(h h g ) 

in the development of the symmetric function 

(1 2 )‘. 

Some examples supply simple verifications. 

By ordinary multiplication we find that 

(a/3 + ay + a8 + /3y + /3S + yS) 2 = 2a 2 /3 2 + 22a 2 /3y + 6a/3y8, 

or (l 2 ) 2 = (2 2 ) + 2 (2 1 2 ) + 6 (l 4 ). 


The distributions which give the three coefficients 1, 2, 6 are 
Ai A 2 A\ A 2 A^ A 2 


afi ay 

a/3 

yS 

ay a/3 

ay 

/3S 


aS 

Py 


Py 

aS 


/38 

ay 


y 8 

ap 


Again, by developing 

(2a Pf = (l 2 ) 3 

we find a term 152a 2 /3 2 y8 = 15 (2 2 1 2 ). 


The fifteen distributions are 

A\ A < 2 A^ A 1 A 2 A 3 


a/3 ay fid 
afi /38 ay 

ay afi fi 8 
ay /3S afi 

fi 8 a/3 ay 
fi& ay afi 


afi a 8 fiy 
a/3 fiy a8 
a8 afi fiy 
a 8 fiy afi 
fiy afi aS 
fiy a 8 afi 


Ai A 2 A 3 
a/3 a/3 y8 
a/3 y8 a/3 

y8 afi afi 


There is no simple expression for the general coefficient in the 
development of (2a/3)‘; but when i is not too large there is a method 
of arriving at the value of any desired coefficient which will be given at 
a later stage. 


20 


OPENING OF THE THEORY OF DISTRIBUTIONS 


25. We pass on to consider the symmetric function product 

(2a/3)‘(2ay=(l0‘(iy. 

We write out the i factors followed by the j factors and proceed to 
obtain one term in the development by taking one combination of two 
letters from each of the first i factors and one letter from each of the 
last j factors. Assembling the letters so obtained we reach, suppose, 
a combination of 2 i +j letters 

afa.^ ... a/». 

In correspondence with the selective process that has resulted in 
this combination we take i +j different boxes, so that the box speci- 
fication is (l i+J ) 

AiA 2 A 3 ... A i B l B 2 B 3 ... Bj. 

We place the two-letter combinations that were selected from the 
1st, 2nd, ... * th factors in the boxes A u A 2 , ... A ( respectively; and the 
single letters that were selected from the last j factors in the boxes 
B lt B 2 , - Bj respectively. If we make a similar distribution for every 
case in which the selective process in the multiplication results in the 
combination aftaf*... a/« we will have obtained every distribution of a 
definite set of objects of specification (p 3 p 2 . . . p,) with boxes of specifica- 
tion (1 <+/ ) in such wise that in regard to i of the boxes A,, A 2 , ... Ai 
each box contains objects of specification (l 2 ), and in the remaining 
boxes B lt B 2 , ... Bj each box contains a single object. Hence we gather 
that distributions so specified are enumerated by the coefficient of the 
function (p 3 p 2 ...p,) in the development of the product 

(i 2 )‘(iy. 

In the distribution above defined the reader must notice that objects 
of specifications (l 2 ), (1) are restricted to the boxes A,, A t , ... ; B,, B lt ... 
respectively. This implies that the boxes being in a definite order the 
i +j combinations of objects are only allowed i'jl permutations; that 
is to say that no exchange of combinations of objects of different specifica- 
tions is allowed to take place. If such exchange he permitted (i + j) ! 
permutations between the combinations of objects may take place. The 
function that by its development enumerates the distributions must 
now be multiplied by 

(i 

and we have the theorem : — 

“If objects of specification ( p, p 2 ...p s ) be distributed into boxes of 
specification (l 1 ^) in such wise that i of the boxes (unspecified) receive 


OPENING OF THE THEORY OF DISTRIBUTIONS 


21 


objects of specification (l 2 ) and the remaining boxes objects of speci- 
fication ( 1 ), the number of distributions is equal to the coefficient of 
the function (piPi •••/>„) in the development of the function 

As an example it is found that 

( 2 ) (1 2 ) 2 (1) 2 = ... + 48 (321) + .... 

The 48 distributions are 

A 1 A 2 A 3 

the 12 permutations of a/3 a/Jay 
24 „ a/3 ay a /3 

12 „ a/3 /3y a a 

26. It is quite evident that the process by which we have reached 
this connecting link between distributions and the expansion of sym- 
metric function products is of general application. The selective process 
is in correspondence with distribution when the factors of the symmetric 
function products are any monomial symmetric functions whatever. 

For consider the product 

( 2 a”iaj”i ... a t ’"l)‘ (Safari .. . a u "«) J = (m 1 m 2 ... «»,)'(»! n 2 ... n u ) J . 

We write out the i factors followed by the j factors and obtain one 
term in the development by taking one term from each of the i +j 
factors. The i terms from the first i factors are each of them combina- 
tions of specification (m^m ^ ... m t ). The j terms from the last j factors 
are each of them of specification ( n,n 2 ... »„). The assemblage of i+j 
terms is, suppose, 

“i p ‘<h p ‘... a/> of specification (pipi ... p„). 

In correspondence with the selective process we take i +j boxes of 
specification (l i+/ ) 

A , A 2 ... A , B j B-2 ... Bj . 

We place the combinations that have been selected from the first i 
factors in the boxes A respectively and the remaining combinations in 
the boxes B. 

If we make a similar distribution for every case in which the selec- 
tive process results in the combination afta.?'- ... aft we will have 
obtained every distribution of a definite set of objects of specification 
(P\pi ■■■ P») into boxes of specification (l it ' i ) subject to the condition 


22 


OPENING OF THE THEORY OF DISTRIBUTIONS 


that the combinations of specifications . . . m t ), (n,n 2 ... n u ) must 

be placed in the boxes A, B respectively. 

Removing this condition we find as before a theorem 
“If objects of specification (p 2 p 2 ■■■p,) be distributed into boxes of 
specification (l i+/ ) in such wise that i of the boxes (unspecified) receive 
objects of specification (m 1 m 2 ... m t ) and the remaining boxes objects of 
specification (n^ ... n u ), the number of distributions is equal to the 
coefficient of the function (pip 2 ■■■Pa) in the development of the 
function 

C + i') OT <)‘ n u y.” 


27. The same reasoning applies when any number of monomial 
symmetric functions are multiplied together and we may enunciate 
the general theorem : — 

“If objects of specification (p 2 p. 2 p s ) be distributed into boxes of 
specification (i i+ i+*+ -) in such wise that i unspecified boxes receive 
objects of specification (wijWi., ... m t ), j other unspecified boxes objects 
of specification («, n., ... n u ), k other unspecified boxes objects of specifica- 
tion ... o„), etc., the number of distributions is equal to the co- 
efficient of the function (pip 2 ■■■ p,) in the development of the function 

(r + ?* + &+ ...)!, ... . . , . 

— (to, m 2 ... m,) 1 (», n, . . . n u y (o,o 2 . . . o v ) k . ..” 

Verifications may be made by means of the formula 
(2a ! /3y) (2a/Jy) = + 22a 3 /3 2 y8 + 62a 3 /Jy8 e 

+ 3 2a 2 /3 2 y 2 S + 6 Sa^ySe + lO2a 2 0y8*0, 

otherwise written 

(21 2 ) (l 3 ) = (32 2 ) + 2 (32 1 2 ) + 6 (31 4 ) + 3 (2 3 1) + 6 (2 2 1 3 ) + 10 (21 5 ). 


28. As it is important to be able to obtain readily the numerical 
values of such coefficients, we will subject this particular development 
to examination with the object of deducing general laws in the algebra 
of symmetric functions. 

Suppose that the symmetric functions appertain to an unlimited 
number of quantities a, /3, y, ... and expand each side of the identity in 
powers of one of them, say a. The function (32 2 ) or 2a 3 /3 2 y 2 involves 
some terms which do not contain a; terms such as /3 3 y 2 8 2 for example. 
The aggregate of these terms is (32 2 ) regarded as appertaining to the set 
of quantities Ay, 8,..., the original set with the omission of a. The 
function involves no terms containing the first power of a, but it has 
terms such as a 2 /J 3 y' 2 which contain the second power of a, the aggregate 


OPENING OF THE THEORY OF DISTRIBUTIONS 


23 


of which is a 2 (32), if (32) now appertains to the set /8, y, 8 , .... Lastly 

it involves terms a s (2 2 ), where (2 2 ) refers to the set /3, y, 8 

Hence we may write 

(32 2 ) = (32 2 )' + a 2 (32)' + a 3 (2 2 )', 

the dashed round bracket denoting that the symmetric functions refer 

to the deficient set of quantities /?, -y, S 

Similarly 

(32 1 2 ) = (321 2 )' + a (321 )' + a 2 (31 2 )' + a 3 (21 2 )', 

(31 4 ) = (31 4 )' + a (31 3 )' + a 3 (l 4 )', 

(2 3 1) = (2 3 l)’ + a (2 3 )' + a 2 (2 2 1)', 

(2 2 1 3 ) = (2 2 1 3 )' + a (2 2 1 2 )' + a 2 (21 3 )', 

(2 1 5 ) = (21 5 )' + a (21 4 )' + a 2 (l 5 )'. 

The right-hand side of the identity may therefore be written 
(32 2 )' + 2 (32 1 2 )' + 6 (31 4 )' + 3 (2 S 1)' + 6 (2 2 1 3 )' + 10 (21 5 )' 

+ a {2 (321)' + 6 (31 3 )' + 3 (2 3 )' + 6 (2 2 1 2 )' + 10 (21 4 )') 

+ a 2 {(32)' + 2 (31 2 )' + 3 (2 2 1)' + 6 (21 3 )' + 10 (l 5 )'} 

+ a 3 {(2 2 )' + 2 (21 2 )' + 6 (l 4 )'}. 

As regards the left-hand side, since 

(21 2 ) = (21 2 )' + a(21)' + a 2 (l 2 )', 

(l 3 ) = (l 3 )'+a(l 2 )', 

we find that 

(21 2 ) (l 3 ) = (21 2 )' (l 3 )' + a {(21)' (l 3 )' + (21 2 )' (l 2 )'} 

+ a 2 {(l 2 )'(l 3 )' + (2l)'(l 2 )'} + a 3 (l 2 )'(l 2 )'. 

Now equating the coefficients of like powers of a (omitting the case 
a 0 ) and suppressing the dashes to the round brackets by converting the 
set of quantities /?, y, S, . . . into the set a, /?, y, . . . through writing a, /?,y, . . . 
for /?, y, 8, ... respectively, we obtain the derived formulae 

(21) (l 3 ) + (21 2 ) (l 2 ) = 2 (321) + 6 (31 3 ) + 3 (2 3 ) + 6 (2 2 1 2 ) + 10 (21 4 ), 

(l 2 ) (l 3 ) + (21) (l 2 ) = (32) + 2 (31 2 ) + 3 (2 2 1) + 6 (21 3 ) + 10 (l 5 ), 

(l 2 ) 2 = (2 2 ) + 2 (21 2 ) + 6 (l 4 ). 

Thus we can derive, from any given identity, a number of other identi- 
ties of lower weights. The very simple process is that of expansion in 
ascending powers of the quantity a. 

We observe that the coefficient of a m in any monomial function is 
obtained by merely deleting the part m from the partition which denotes 
the function; if the part m be not present the coefficient is zero. Observe 
also that in the product (21 2 ) (I s ) the highest power of a that presents 
itself is 3 because 2, 1 are the largest parts in the factors respectively 


24 


OPENING OF THE THEORY OF DISTRIBUTIONS 


and 2+1=3. It follows at once that the coefficient of a 3 in the product 
is found by simply obliterating the first or largest part in each factor. 
We thus arrive at the coefficient (l a ) a . Thus from the original identity 
(21 a ) (l s ) = (32 a ) + 2 (321 a ) + 6 (31 4 ) + other terms which involve no part, 
in the partitions, as large as 3, we derive, at sight, the new identity 
(l a ) a = (2 a ) + 2(21 a ) + 6 (l 4 ). 

From this we discover immediately new theorems in distribution. 
As an example, since 

2 (21 2 ) (l 3 ) = ... + 12 (31 4 ) + .. . , 

(l a ) a = ... + 6 (l 4 ) + ..., 

we can assert that the number of distributions of objects of specification 
(31 4 ) into boxes of specification (l 2 ) in such wise that the boxes contain 
objects of specification (21 2 ) and (I s ) is twice the number of distribu- 
tions of objects of specification (l 4 ) into boxes of specification (l 2 ) in 
such wise that both boxes contain objects of specification (l 2 ). 

Examination of the distributions verifies this conclusion and the 
theory we are now discussing might have been entirely based upon a 
study of the distributions. 

29. In order to facilitate the process of taking the coefficients of a m 
in a symmetric function it is convenient to adopt a mathematical short- 
hand. Let the symbol 

A,, 

placed before any symmetric function, stand for the phrase 
‘the coefficients of a n in.’ 

Then when D m is prefixed to a monomial function expressed in the 
partition notation, the result is the deletion of the part in from the 
partition; if the part m be not present the result is zero; if m itself be 
zero the result is to leave the function unaltered or, as we may say, to 
multiply the function by unity. For example 

A(32 2 ) = (2 2 ); A(32 a ) = (32); A(3) = l; 

A (32 2 ) = A (32 2 ) = 0 ; A (32 2 ) = (32 2 )*. 

* It should be stated that the reader who is acquainted with the differential 
calculus will realise that D m is effectively a partial differential operator of the 
order m which is expressible by means of symmetric functions in a variety of ways 
and, in particular, in terms of the elementary functions (1), (l 2 ), (l 3 ), ... which 
have been denoted above also by a 1# a 2 , d 3 , .... 

It was brought to light in 1883, Proc. Lond. Math. Sue., by James Hammond 
and is freely used in ‘ Combinatory Analysis ’ and in many researches by the author 
which have been published in Scientilic Journals during the past thirty years. The 
methods of the calculus are not necessary for this elementary exposition and the 
requisite properties of the symbol will be set forth without its aid. 


OPENING OF THE THEORY OF DISTRIBUTIONS 


25 


30. Any symmetric function A may be written in ascending powers 
of the quantity a in the form 

D 0 F+ aD l F+ a?D 2 F+ ..., 

in accordance with the definition of D m F. 

Hence the product of two functions A,, A a is 
(A A + aD l F 1 + a? D -2 A, + . . . ) (D 0 F t + a D x F t + a 2 A F 2 + ...), 
or A A A A 

+ a (AA A A + />, A Do A„) 

+ a 3 (A A A A, + A A A F. 2 + A A A A,) 

+ a 3 (DoF, DoF, + AAA A + A A A A + AAA A) 

+ .... 

Moreover 

A A, = Do (A A) + «A (AA) + a 2 A (A A,) + . . . . 

Whence comparing the coefficients of a, a 2 , a 3 , . . . , 

A (A AO = A A ■ A A + A A • AA 2 , 

A (A A) = A A • A A + A A • A A, + A A • A A, 

A (A A) = A A • A A + A A . A A + A A . A A + A A • A A, 


A, (A A) = 2 AA.A.-.A, 

where on the right-hand side there is a term in correspondence with 
every composition (see Art. 10) of the number m, zero counting as a 
part. There are visibly m + 1 terms, but usually fewer than m+ 1 will 
materialise because by the rules of operation many terms may vanish. 

Similarly if we require the coefficients of a m in the product of three 
functions 

AAA, 

the performance of the symbol D m will involve a term 

A A i . A A,. D m - s -iF 3 , 

because one step in the multiplication is to find the coefficients of a*, 
a‘, a m ~ 9 ~ t in A, A a , A 3 respectively, and then to multiply the three co- 
efficients together. 

Hence 

An (A A F 3 ) = 2 2 A A • A A 2 • An_ s _ t A 3 . 

s =0 <=0 

Since s, <, m- 8 — fisa composition of the number into three parts, 
zero counting as a part, the symbol D m breaks up into as many triads 


26 OPENING OF THE THEORY OF DISTRIBUTIONS 

of symbols as the number m possesses compositions into three parte, 
zero counting as a part. The reader will have little difficulty in proving 
that the number of these compositions is 

In general, when the symbol D m is prefixed to a product of i sym- 
metric functions, it breaks up into as many f-ads of symbols as the 
number i possesses compositions into i parts, zero counting as a part. 
The number of such compositions is 

31. We can now see the importance of the study of the symbol, for 
evidently we can repeatedly operate with it, varying its suffix as may 
be desired, until a positive integer or zero is reached, and thus solve 
the problem of the multiplication of symmetric functions upon which 
the present view of combinatory analysis depends. For consider the 
product 

(21 s ) (l 3 ), 

we have 

A (21 2 ) (i 3 ) = a ( 21 s ) . A (I 3 ) = (l 2 ) (I 2 ), 

A A(2i 2 ) (l 3 ) = A (l 2 ) (i 2 ) = A (l 2 ) • A (l 2 ) = (l) (l), 
AAA (2i 2 ) (i 3 ) = A (i) • A (l) + A (i) • A (l) = 2 (l), 

and finally 

AAA 2 (21*) (l 3 ) = 2A (l) = 2. 

Now we may write 

(21 2 )(1 3 ) = ... + C(321 2 ) + 

so that operating upon both sides with AAA 2 the right-hand side 
becomes C since every other term is reduced to zero by the operation. 
The calculation above shews that the left-hand side becomes 2 by the 
operation. Hence (7=2, and 

(21 2 )(1 3 ) = ... + 2(321 2 )+.... 

We can in this way calculate the result of the product of any 
number of monomial functions and thus evaluate the number which 
enumerates a well-defined distribution of objects into boxes. 


CHAPTER III 


DISTRIBUTION INTO DIFFERENT BOXES 

32. The theory set forth in the foregoing chapters enables us to 
make a great advance in combinatory analysis. 

We are now able to attack the following problem. 

Objects of any given specification are to be distributed into m 
different boxes, i.e. of specification (l m ); in how many ways can the 
distribution be made? 

First consider the case of two boxes, denoted by A u A.,, and let the 
objects be w in number. It has been shewn in Art. 26 that if the speci- 
fication of the objects be (/>, p., . . ,p s ) and the boxes are obliged to contain 
objects of specifications (m 1 m 2 ...m t ), {n x n 2 ...n u ), both specifications 
appearing, one in each box, the enumerating symmetric function product 
is 2 {m x m 2 ... m ,) (», n 2 . . . »„) or {m x m 2 ... m,f if the partitions ( m , m 2 ... m t ), 
(n 1 n i ...n u ) be identical. 

We have merely to develop the product and seek the coefficient of 
the function (p x p 2 . . ./>,). 

We now abolish the restriction and substitute another, viz. that the 
boxes are to receive, one of them w l objects and the other w 2 objects. 
We have 

w x + w 2 = w ; 

the w objects may have any specification and the u\ and w 2 objects 
may have any specifications consistent with the condition that the 
assemblage of w 1 and w 2 objects must have the same specification as 
the w objects. If the specification of the w objects w'hich are to be 
distributed be unknown the and w. 2 objects may have specifications 
denoted by any partitions of w, and w 2 respectively. The w 2 objects 
may have therefore all specifications included in the function h m , the 
w 2 objects all those included in the function If we form the 

functions 

h w „ or h w ?, 

according as w l , w 2 are not or are equal, we obtain, upon multiplication, 
terms of the forms 

2 (m i m 2 ...m t )(n l n 2 ...n ll ) or ( m i m 2 ...m t y , 
and it has been shewn already that these functions enumerate, on de- 
velopment, the distributions which are associated with the particular 
specifications ( m,m 2 ... m,), (n x n 2 ... n u ). 


28 


DISTRIBUTION INTO DIFFERENT BOXES 


As an example let us distribute 4 objects into two different boxes 
so that one box, unspecified, contains 3 objects and the other box 
1 object. 

We have 

2Wi = 2{(3) + (21) + (l s )}(l) 

= 2 (4) + 4 (31) + 4 (2 a ) + 6 (21 2 ) + 8 (l 4 ), 

leading to the conclusion that objects of specification (21 2 ) can be 
distributed in 6 ways and similarly when the objects have other speci- 
fications. 


The distributions for all of the cases are : 


Spec. (4) 

(31) 

(2 2 ) 

(21 2 ) 

(1 4 ) 

A \ A 2 

A 1 A 2 

A\ A 2 

-^1 A 2 

A 1 A 2 

a 3 a 

a 3 P 

a 2 y3 p 

a/?y a 

a(3y 8 

a a 3 

fi a 3 

P 

a ajiy 

8 afi-y 


a 2 /? a 

a/3 2 a 

a 2 /3 y 

a/38 y 


a a?f$ 

a a/3 2 

y a 2 /3 

a 2 y p 

P “ 2 y 

y aj38 
ayS p 

/3 ayS 

PyS a 

a /3y8 

No. 2 

4 

4 

6 

8 


in agreement with the theory. 


33. Having thus obtained the enumerating function 2 h w ,h w , or h w 'f 
for the special numbers w 2 we can include all cases by giving w u w 2 
all possible values and adding the corresponding enumerating functions. 

Thus for w = 2 we have h,, 

— 3 ,, 2 h-Ji, , 

„ =4 ,, h* + ‘2h 3 hi, 

,, — 5 ,, 2A3A2 + 2^4 A] , 

and so on, while in general we seek the coefficient of x w in the ex- 
pansion of the function 

( hiX + h 2 x 2 + hiX? + . . . ) 2 . 

We may state the theorem: — 

“The number of distributions of objects of specification (p,p 2 ■■■ p,) 
into boxes of specification (l 2 ) is equal to the coefficient of the function 


DISTRIBUTION INTO DIFFERENT BOXES 


29 


(p,p 2 in the development of the coefficient of x w in the expansion 
of the function 

( 'h x x + h^x 1 + h 3 x" + ...y 
where p x + p 2 + ...+p, = u\” 

As an example when w = 4, 


u 2h 3 /i\ + h.y = 3 (4) + 

6(31) + 7 (2 2 ) 

+ 10 (21 2 ) + 

14 (l 4 ). 

The distributions are 




!C. (4) (31) 

(2 2 ) 

(21 2 ) 

(I 4 ) 

A i A 2 A i A o 

A i A 2 

A ! A, 

A i 2 

a 3 a a 3 p 

« 2 /8 P 

d-p y 

a/3y 8 

a 2 a 2 P a 3 

P °-*P 

y d*p 

8 a/3y 

a a 3 a' 2 5 a 

a/? 2 a 

a 2 y p 

a/38 y 

a a' 2 /? 

a a/8 2 

P a 2 y 

y a/38 

a 2 a/3 

a 2 (P 

a/?y a 

ayS /3 

a/2 a 2 

/P a 2 

a aPy 

/3 ay8 


a/3 a/3 

a 2 py 

/3y8 a 



py a 2 

a /3y8 



a/8 ay 

a/8 y8 



ay a/8 

y8 a/8 
ay /38 

/88 ay 
a8 /8y 
/8y a8 

3 6 

7 

10 

14 


34. Passing to the case of three boxes of specification (l 3 ) we con- 
sider a distribution in which the boxes contain »„ w 2 and w 3 objects 
in any order respectively. The possible specifications of these lots of 
objects are shewn by the partitions of the functions which are terms in 
, h ah , k Wa and when these specifications are assigned the corresponding 
symmetric function products will be terms of the developed products 

h Wl \ 3h w ‘h Wl , 6 h Wl h 

tOi ^ «';! 1 

according to the equalities that present themselves in the numbers 

w u tv 2 , w 3 . 

To see how this is we observe that from Art. 27 the distributions 
associated with specifications 

(m 1 m 2 ... m t ), («,« a (o l o 2 ...o v ) 


30 


DISTRIBUTION INTO DIFFERENT BOXES 


are enumerated by the functions 

(»»i m. 2 . . . m t y, 3 («i ■ mt) 2 (% n 2 ■■■ n ^), 

6 ...m,) (», n 2 . . . »„) {o t o 2 . . . o„), 

according to the identities that subsist between the three partitions. 
It is obvious that if w, = w 2 = w s the three partitions are all of the same 
weight and h w 3 will give the three functions which have coefficients 
1, 3, 6 respectively. If w u w u w 2 be the three weights ?>h w 2 h. W2 involves 
on development the functions with coefficients 3, 6. Finally if w u w 2 , 
w 3 are three different numbers, Gh Wt h Wa h v ,, produces all the functions 
which have the coefficient 6. 

We can now include all cases by giving w Jt iv 2 , w 3 all values and 
adding the corresponding enumerating functions. 

Thus for w = 3 we have h 3 , 

4 ,, 3h 2 h 2 , 

5 „ ZhsK + ZKh?, 

6 ,, h 2 + 3h t h* + 

and so on. 

In general the enumerating h function is the coefficient of x w in the 
expansion of 

( h x x + h 2 x 2 + h 3 x s + ...) 3 

and if we develop this h function the coefficient of the symmetric 
function (p,p 2 ■ ■■ p a ) is equal to the number of ways of distributing objects 
of specification (p,p 2 ••• p,) into boxes of specification (l 3 ). 

35. We can now enunciate the general theorem : — 

“ The number of ways of distributing objects of specification 
{p\P' 2 .--py> into boxes of specification (l m ), no box being empty, is equal 
to the coefficient of 

x p,*ps-+p, (p,p 2 ...p s ) 
in the development of the function 

( h,x + h^x 3 + li 3 x 3 + ...)”*.” 

In order to be able to use this theorem in practice it is necessary 
to expand products of the functions h 2 , h. 2 , h 3 , ... in terms of monomial 
functions. This may be readily accomplished by use of the operative 
symbols D„, A, D 2 , ... because observing in the first place that 
D 0 h, = A {(3) + (21) + (I 3 )! = (3) + (21) + (l 3 ) = h 3 , 

DJh = (2) + (l a ) = h 2 , 

D 2 h 3 = (1) = hi, 

D 3 h 3 — 1 , 


DISTRIBUTION INTO DIFFERENT BOXES 


31 


it is easy to see that 

Dm ~ hw-m 

is universally true if we agree that h 0 = 1. 

If hj be the homogeneous product-sum, of weight w, of the 
quantities (3, y, S, ... we may write 

h w = D 0 /lJ + a.DihJ + a 2 D. 1 // w ' + a s £>,kj + .. . , 

so that 

AAi=( A A + aD l h ull ' + a * DJl w ' + ...)(D 0 k Wl ' + aD i A u . 1 ' + a 2 D 2 h W t + ...). 
But 

KX 2 = A(A,'A,') + «A(W + q2 

Hence equating coefficients of like powers of a and suppressing the 
dashes by writing a, /3, y, ... for fl, y, 8 ... 

= A A, • D m ha, + A A, ■ An-l A s + • • • + A, A, • A A 

~ + ^wi-ih W2 - m +i + ... + h Wi _ m h Wi1 

shewing the way in which the symbol D,„ operates upon any product 
K,k m - Compare Art. 30. 

Similarly I) m operates upon a product of s functions h Wl h^...h u . t 
through the medium of the various compositions of m into s parts, zero 
counting as a part. 

36. Thus if we desire to develop the function 

h 3 + MX 2 + Gk 3 kX 

and require the coefficient of the function (51) the process may be as 
follows : 

AA/ + 3/Mi 2 +6/i 5 M,) 

— D 2 h 2 • A^a • AAj + D 2 k. 2 • DXi • A hi + /)] h ., . DX . D h._ 

+ 3 ( DX . DX . D 0 h x + DX . DX .DX + DX . DX ■ DX) 

+ 6 (D 3 h 3 . D 2 h 2 . DX + DX ■ DX . DX + D-X ■ D 2 h. 2 . A^i) 

= 3^ + 9A, + 18^ = 30A,, 

A> A (h. 3 + 3 hjiy + Gh 3 h 2 k^) — 30, 

establishing that objects of specification (51) can be placed in boxes of 
specification (l 3 ), no box being empty, in 30 ways. 

37 . There is an alternative process which is of much interest. 

Write h 2 x + hX + hX + ... = H, 

and note that the coefficient of % P ' +Pl+ - +P *(p 1 p. i ... p,) in H m is 
A. As ■ As (coefficient of x p i +p i+ ■■■+ p > in H m ). 


32 


DISTRIBUTION INTO DIFFERENT BOXES 


Now + 

=(1 + HT- ( ? ") (1 + Hr-' + Q (1 + H )—- ... , 

and X),, (1 + H) = x v (1 + H) by the law of operation. 

Also D p (\ + HJ = D P (\+ II). D t (l + H) 

+ D,. l (l+H).D l (l+H) + ..., 

there being one term on the right-hand side corresponding to every 
composition of p into two parts. 

By Art. 30 the number of these compositions is 


Hence 


on 1 )- 

~ P D P {\ + H) 2 = ^ (1 + Hf; 


also D p (l + H) 3 = D p (l + H) . I) 0 (l + H) . D„(\ + //) +■•■ 

+ D a (l + H).D b (l+H).D c (l + H) 

+ ..., 

there being one term on the right-hand side corresponding to each 
composition of p into three parts. The number of these compositions is, 

/p + 2\ 


by Art. 30, 

We have ar'D, (1 + H ) 3 = (f + 2 ) (1 + H)\ 

and generally x~ T D p (1 + H) m = m _ i ) C 1 + H) m . 

Making use of these results 

-(T) (KVY'+nr-' 


2 + 




)(!+*)- 

V m - \ / \ m — \ / K 


+ ... , 


DISTRIBUTION INTO DIFFERENT BOXES 


33 


and ultimately 

ar*-*--* Dpi Dn ... D p , H m 

= fPi + m- 1\ fpz + m- 1\ fp, + m-l\ 

V m - 1 /V m- 1 /" \ m - 1 ) 

_ /m\ fp x + m-2\ /p 2 + m - 2\ tp, + m - 2\ 

\ 1 / \ m- 2 / V m - 2 / " ' \ m- 2 / 

+ ( m \ (Pi + m ~ (Pi + nn - 3N (p s + m- 3\ 
\ 2 /\ m - 3 )\ m — 3 /" \ m - 3 / 

■ ' ’ 5 

because we know that the right-hand side cannot involve x. We may 
therefore finally put x and therefore H equal to zero. 

To verify the result of the preceding Article put 
m = 3, Pi = 5, j d s = 1. 

The formula gives 



= 63-36 + 3 = 30. 


The series written down is thus established as enumerating the number 
of distributions of objects of specification (pip 2 ■■ ■ p») into boxes of 
specification (1”*), no box being empty. 

38. In the above investigation there is no restriction upon the 
number of times that any one of the quantities a, /?, y, ... may appear 
in the same box. 

If no object is to appear more than once in the same box, a box 
which contains w x objects must contain objects denoted by the letters 
of one of the terms of = (I'D). Hence instead of the functions 
hi, ht, h 3 , ... we have presented to us the functions a x , a 2 , a 3 , ... 
and writing 

a x x + aj* 2 +a 3 r* + ... = A 

the enumerating function is the coefficient of x p '* v ^- +p » in 

A m . 

If m =3, the function which now enumerates the distributions into 
boxes of specification (l 3 ) is 

a 2 + 3 aid* + 6rt 3 a 2 (7i 
= (l 2 ) 3 + 3 (l 4 ) (l) a + 6 (l 3 ) (1*) (1), 

and if the objects be of specification (321) the number of distributions is 
AAA {(l 2 ) 3 + 3 (l 4 ) (l) 2 + 6 (l 3 ) (l 2 ) (1)}. 


34 


DISTRIBUTION INTO DIFFERENT BOXES 


By the rule of operation we find 

A{(17 + 3(1‘)(1) 2 + 6(1 3 )(1 2 )(1)} 

= (1) 3 + 3 (l 3 ) + 6 (l 2 ) (1), 

A A produces 3(1) + 6(1), 

and finally AAA {(l 2 ) 3 + 3 (1 4 )(1) 2 + 6 (l 3 ) (l 2 ) (1)} = 9. 

The actual distributions are 


A x 

A , 

A 3 

A , 

A , 

a 3 

a @y 

“X 

a 

a/3 

a/3 

ay 

a /3y 

a 

a/3 

a/3 

ay 

a/3 

a/3 

afiy 

a 

ay 

a/3 

a/3 

a P 

a 

afiy 




a 

a/3y 

a/3 




a 

a/3 

a/3y 





39. In the alternative method we write 

1 + A = 1 + + a 3 x"- + a 3 a? + ... . 

The reader will have no difficulty in establishing the formula 

D, (1 + J)" = (“)«■> (l + jt)- 

so that operating upon A m in the form 

(1 + A)~ - (1 + A) m ~' + Q (1 + A)— 

we readily reach the number which enumerates the distributions of 
objects of specification (pip 3 ... pf) into boxes of specification (l m ), no 
box being empty, subject to the condition that no particular object is 
to appear twice in the same box. The number is 

o (;)•■<)-(?) (VXT)-T; 1 ) 

xxrxvM”; 2 )--- 

To verify the special case m = 3, p x = 3, p 2 = 1 , p 3 = 1 , we find 



DISTRIBUTION INTO DIFFERENT BOXES 


35 


The more general condition that no object is to appear more than 
k times in the same box is treated by means of new functions 

kit kiy kit • * • * 

such that k s is derived from h„ by striking out from the latter all 
partitions which contain parts greater than k. We then operate through 
the medium of compositions which contain no part greater than k and 
we reach a general solution analogous to those which employed the h 
and a functions. 


CHAPTER IV 


DISTRIBUTION WHEN OBJECTS AND BOXES ARE EQUAL 
IN NUMBER 

40. We now come to an important case of distribution which is of 
particular interest in view of the light that it throws upon the algebra 
of symmetric functions. We consider a number of objects and an equal 
number of boxes. We are given the specifications both of the objects 
and of the boxes and place one object in each box. How many distri- 
butions are there 1 

Suppose that q x of the boxes are precisely similar, so that they have 
the specification (q x ). Whatever may be the specification of the q x 
objects that are placed in them it is certain that they have only one 
distribution, because the boxes being identical no permutation of 
the objects alters the distribution. Denote these boxes each by A x . 
The specification of the q x objects must be one of the partitions which 
occur in h qi when expressed in terms of monomial functions. As 
one distribution we may take any product of a, /3, y, ... that occurs 
in h Vl . Also if there be q 2 boxes, each denoted by A 2 , one distribution 
into the q 2 boxes will be any product of a, /3, y, ... that occurs in h qt . 
And similarly for the boxes q s , q it ... q, . Hence we write down the 
factors of the product 

KJi , -hq t , 

each factor being written out in full, and obtain a distribution by taking 
any term of h qi for the q l boxes A,, any term of h q , for the q 2 boxes A 2 , 
etc.... any term of h, H for the q t boxes A,. If these terms when 
assembled constitute a combination which has the specification 

(Pi P-2 ■■■ Ps ) 

we will have one instance of a distribution of objects of specification 
( Pi P 2 ■■■Ps) into boxes of specification (7/, q, 2 ... q,\ one object being in 
each box. It follows that the objects denoted by 

a, Pl a/s ... a /. 

can be distributed into the boxes just as often as the term a/» 

arises in the product h, h h, h ... h. H . The enumeration of the distributions 
is therefore given by the coefficient of the function (p x p 2 ■ p,) in the 
development of h h q , . . . h, H in a series of monomial symmetric functions. 
We have the theorem : — 


DISTRIBUTION WHEN OBJECTS AND BOXES EQUAL IN NUMBER 37 


“The number of ways of distributing n objects of specification 
(p x p 2 ... p a ) into n boxes of specification (q x q 2 ... q t ), one object into 
each box, is equal to the coefficient of symmetric function (p,p 2 .../>,) 
in the development of the product h, h h tli ... k q " 

As an example, if (p,p 2 ... p,) = (411), {q x q 2 ... q t ) = (321), one 
distribution is 

A. x A x A x A 2 A 2 A 3 

a a a a (3 y 


corresponding to the terms a 3 , a/3, y in h 3 , Jh, h x respectively. The number 
of distributions is from previous work 

AA'iWi = 8, 

and the complete table of distributions is 


Aj A 1 A 1 A 2 A 2 A3 


a a a a /3 
a a a a y 
a a a /3 y 
a a /3 a a 
a a /3 a y 
a a y a a 
a a y a /3 
a /3 y a a 


7 

18 

a 

r 

a 

/3 

a 

a 


A table giving the development of products of the functions 
hi, h 2 , k 3 , ... 

will give the complete numerical solution. 


41. We now write the particular distribution we presented above 
in the form, writing A, B, C , ... for A lt A 2 , A 3 , ..., 

AAA B B C 

a a a a /3 y 

and observe that if we interchange the letters by writing A for a and 
a for A, B for /3 and /3 for B, C for y and y for C, we reach a 
distribution 

A A A A B C 

a a a /3 /3 y 

of objects of specification (321) into boxes of specification (411), and 


38 


DISTRIBUTION WHEN OBJECTS AND BOXES 


since we may transform every distribution in this way we obtain the 
theorem : — 

“ n objects of specification (pip 2 ...p,) can be distributed into 
n boxes of specification {rp q . 2 ... q t ), one object in each box, in just as 
many ways as n objects of specification (q y q 2 ... q t ) can be distributed 
into n boxes of specification ( p 2 p 2 ■ ■■ p s ), one object in each box.” 


42. This quite obvious fact in the Theory of Distributions is next 
seen to lead to a Theorem of Symmetry in Algebra which is not only 
not obvious but was for a long time unsuspected. 

If we denote by 

c (Pi Pi ■P>\ 


the number of the distributions under examination we have shewn, that 


q(PiP* ■■ 

•M 

= c( 

'q,q 2 .. 

■q t \ 

\q,q 2 ... 


\ 

^PiPi- 

■ pj 

and this leads to the relation 





Bpx Dp 2 ■ ■ ■ Dp, hq, • • 

hq, - 



D qt 




or, in other words, the coefficient of symmetric function ( p x p 2 ... p s ) in 
the development of h, n h qt ... h qt is equal to the coefficient of symmetric 


function (q, q 2 ... q t ) in the development of h Pl h Pl ... h Pt . This is called 


a ‘ Law of Symmetry,’ because in a table which expresses the h products 
in terms of monomials for a given weight the rows will read the same 
as the columns. Thus such a table for the weight four is 


(4) (31) (2 2 ) (2P) (l 4 ) 

^11111 
fhh, 1 2 2 3 4 

hj 1 2 3 4 6 

fhki 1 3 4 7 12 

4, 4 1 4 6 12 24 


43- If we look again at the distribution 

AAA B B C 

a a a a ^3 y 


the symmetry that arises from the interchange of letters leads to the 
idea that instead of regarding the letters A, B, C as denoting boxes we 
may regard them as also denoting objects, but of a different kind from 


ARE EQUAL IN NUMBER 


39 


the objects denoted by a, /3, y ; so that we may regard the distribution 
as being in fact a pairing of objects of two different sets of objects, 
one object being taken from each set to form a pair. 

Observe that one set of objects involves no objects which appear in 
the other set. If the objects of both sets had been drawn from one set 
of objects, so that the objects in one set were not distinct from the 
objects in the other set, the distribution theory considered here would 
not be valid. For example, if we distribute the objects a, (3, y into the 


boxes A, B, C we obtain the six pairings of the objects 

a, /3, y with the 

objects A, B, C, 




ABC ABC 

ABC 

ABC ABC 

ABC 

afiy a y f3 

/3 ay 

fi y a y a/3 

yfia 

but if we pair off the identical sets 

fi, y\ fi, y, we 

obtain only the 

five pairings 




a fiy a fiy 

a/3y 

afiy 

afiy 

aj3y ay/3 

Pay 

fiya 

yfia 


because the omitted pairing 


a/3y 

yo/3 

is the same as a/3y 

fiy a 

When any object in the one set also appears in the other we have 
a distribution, or pairing, which requires separate consideration, and 
indeed has been investigated up to a certain point*. 


44. The distribution, regarded as a pairing off of sets of objects, 
which are distinct, is to be regarded as having a specification de- 
pending upon similarities of object-pairs. Thus the above pairing may 
be written 

(^)W (BP) (Cy), 

which is said to have the specification (3111), which is also a partition 
of 6, the number of the objects distributed. 

We may say that objects of specification (411) have been distri- 
buted into boxes of specification (321), one object in each box, in such 
wise that the specification of the distribution has the specification 
(3111). 

* “ Combinations derived from m identical sets of n different letters and their con- 
nexion with general magic squares,” by Major P. A. MacMahon, Proc. L. M. S. 
Ser. 2, Vol. 17, Part i. 


40 


DISTRIBUTION WHEN OBJECTS AND BOXES 


Or, we may say that objects of specification (411) have been paired 
off with other objects of specification (321) in such wise that the 
specification of the object-pairs is (3111). 

It is next to be noticed that the interchange of Capital and Greek 
letters does not alter the specification of the distribution. For looking 
at the object-pairing 

(Aa)> (£»)>(£/*)' (Cy) 1 , 

it is clear that the interchange of letters cannot affect the repetitional 
numbers 3, 1, 1, 1, which are the parts of the partition which denote 
the specification. 

45. We have before us clearly quite a new question, viz. the enumera- 
tion of distributions, of given specification, of objects of specification 
(PiPi ■■■p,) into boxes of specification (q,q 2 ... q t ), one object into each 
box. 

In Chapter v this question is considered up to a point. It has 
been solved completely in Combinatory Analysis. Suffice it to say that 
the theory has an important bearing upon the Algebra of Symmetric 
Functions. It establishes a refined law of symmetry connected with the 
partitions ( p,p 2 (<M 2 qt), and the partition which denotes the 

distribution due to the circumstance that the first two of these parti- 
tions may be interchanged without altering the enumeration. 

46. In the present theory the homogeneous product-sums h u h 2 ,h 3 , ... 
have appeared because no limit was imposed upon the number of times 
that similar objects may appear in similar boxes. Thus in boxes A, A, A, 
we have supposed it permissible to place objects represented by any of 
the terms aaa, aa/3, a/?y, ... that compose ht. The specifications of this 
portion of the distribution might be (3), (21) or (I s ). If we had re- 
solved that not more than two similar objects were to be placed in 
similar boxes we could not have placed the objects a, a, a into the 
boxes A, A, A, and instead of the function h 3 we would have taken 
the function 

(21) + (I s ), 

and generally, in each of the functions k , , h, 2 , h 3 , ..., we would have 
deleted all functions which in the partition notation are denoted by 
partitions which involve parts greater than 2. If the conditions be that 
not more than k similar objects are to be placed in similar boxes we 
substitute for h , , h 2 , h 3 ,... the corresponding set of functions k u k 2 , k 3 , . . . 
in which the deletion of partitions involving parts greater than k has 


ARE EQUAL IN NUMBER 


41 


been carried out. We then find that the number of distributions is 
equal to the coefficient of the function (p,p 2 ---p,) in the develop- 
ment of 

k.j. kq a • • ■ k fl[ . 

and establish by interchange of Capital and Greek letters that the 
distributions, subject to the same condition, of objects of specification 
(<M 2 qt) into boxes of specification {p^-.-p,) are enumerated by 
the same number. 

We thus see that the coefficient of (p,p 2 ...p t ) in k qi k qt ... k qt is 
equal to the coefficient of ... q t ) in k p ~k Pa ... k Ps . 

In other words we prove that 

Dp, • • • D Ps k q k qa . . . k qi = D qi D qa . . . Dg t k p k Pt . . . k Pa . 

Moreover, since 

D v h m = h m - p , 

also D p k m = k m . p , 

the evaluation of the coefficients can be carried out. 

The specification of the distribution is clearly not altered by the 
interchange of Capital and Greek letters and we are led to an extended 
theory of symmetry in the Algebra of Symmetric Functions. 

47. The case k= 1 is interesting because the homogeneous product- 
sums become the elementary functions 

(!). (I 2 )) (l 3 ), ■•• = «!, a 2 , 

and we establish that the coefficient of the function (pip 2 ... p 3 ) in the 
development of 

®9i <l 'h ' ■ • a 'lt 

is the same as that of (q t q 2 ... q t ) in the development of 

a pi a pi ■■■ a p s ■ 

This particular case of symmetric function symmetry has been known 
since the time of Meyer Hirsch early in the nineteenth century and 
several proofs have been given of it. That here given, based upon the 
theory of distribution, is the simplest and most suggestive. Since the 
specification of one of these distributions cannot involve any number 
greater than unity, we see that every distribution must have the same 
specification, viz. (1"), where n is the number of objects. In the calcula- 
tion the symbol D operates entirely through the medium of composi- 
tions of numbers which are composed entirely of units and zeros. This 
is so because 

Dm (l*) ^ D m a v = zero if m be greater than unity. 


42 


DISTRIBUTION WHEN OBJECTS AND BOXES 


Thus 


IWi 2 = A (l 2 ) (1) (1) = A (l 2 ) . A (1) • A (l) = (1), 
A (l 2 ) (i) (l) = A (l 2 ) . A (l) • A (i) + A (i 2 ) . A (l) ■ A (l) 

+ A(i 2 ). A(i). A(i) 

= 2(i) 2 + (i 2 ), 

A (l 2 ) (l) (l) = A (l 2 ) • A (i) • A (l) + A (l 2 ) ■ A (l) . A (l) 

+ A(r). A(i). A(i) 

= (l) 3 + 2 (l 2 ) (1). 


48. It has been established that the number 

A>i A, ' • • A hq,/l qi ... hq t 

enumerates distributions of objects into boxes when the distributions 
are subject to certain conditions. 

We can now shew, by reasoning upon the method of obtaining this 
result, that the same number enumerates certain arithmetical construc- 
tions of quite a different nature. When l) Pi operates upon h, h h qt . . . h q , 
it acts through a number of compositions of p 1 into t parts, zero count- 
ing as a part. In this way we obtain the sum of a number of products 
of which the type is 

^9,-c,^9s-c s kq t -c t , 

where c 2 c 2 ...c t is a composition of p L . 

Each of these products has unity for coefficient. 

Restricting attention to the product above written the operation of 
As is performed through compositions of p 2 , and we obtain from the 
one product we are attending to a number of products of which the 
type is 

^qi-Ci- di^a-cj-da •** hq t -c t -d t i 

where d x d 2 ■■■ d t is a composition of p 2 . 

Each of these products has unity for coefficient. 

Restricting the attention to this last written product the operation 
of A a yields a number of products of which 

^qi-c l -d l -e l A-Ca-dg-es “* 

is the type, where e 1 e 2 ... «, is a composition of p 3 . 

Each of these products has unity for coefficient. 

Finally, by this process, when we operate with D Ps through one of 
the compositions of p s , viz. 

o'] cr 2 ... cr t , 

we reach the product 

1 x h(j ho . . . = 1 . 


ARE EQUAL IN NUMBER 


43 


We will then have arrived at the enumeration of one of our distribu- 
tions through the medium of the succession of compositions 

C1C2 - • ■ Ci , d\ d 2 . . . d t , • &ty ( 7 ^( 7 ^ ... 1 , 

of the numbers p lt p 2 , p 3 , ... p, respectively. 

We may say that the particular distribution thus enumerated is in 
correspondence with the numbered diagram 


Cl 

c 2 

^3 

d , 

d. 2 

d 3 

«1 

e 2 




which involves a rectangle of s rows and t columns. 

What is the definition of this diagram ? Clearly the sums of the 
numbers in the successive rows must be p lt p 2 , p 3 , ... p, respectively, 
and the sums of the numbers in the successive columns must be 
9i, ( h, >h, ■■■ '/t respectively. The numbers must be positive integers 
(zero included) and there is no restriction upon the magnitude. 

To every such diagram also there corresponds one distribution. 
Hence the number 

Dp, ■ ■ ■ D Vs A 9l K h ... hq t 

enumerates the diagrams so defined. 

To take a very simple example, the number 
D?h 2 h? = 4 

enumerates the diagrams 



where the rows add up to 2, 2 and the columns to 2, 1, 1. 


49. We have an analogous enumeration also when the condition is 
that not more than k similar objects are to be placed in similar boxes. 

In every case the reciprocity that exists between the specifications 
of the objects and of the boxes can be exhibited by rotating the dia- 
grams through a right angle. 


44 


DISTRIBUTION WHEN OBJECTS AND BOXES 


These identities of enumeration are simple instances of a very 
extensive theory in Combinatory Analysis. 

50. Before closing this chapter it may be remarked that the 
placing of objects of any specification in boxes which are identical, one 
object in each box, is equivalent from a distribution point of view to 
placing the same objects in a single box. In both cases the objects 
can be permuted in any manner without changing the enumeration. 
There is in fact only one distribution. Consider then a distribution such 
that q 1 objects are placed in q, similar boxes A u q 2 objects in q 2 
similar boxes A 2 , ... q t objects in similar boxes A , , the sets of objects 
having any specifications and one object being in each box. In con- 
trast with this consider the q } , q 2 , ... q, objects placed in single boxes 
Bi, B*, ■■■ B, respectively. If the numbers q It q 2t ... q, be all different 
we cannot in the first distribution interchange any pair of the sets of 
?ii <? 2 > ••• qt objects because, for example, the q r objects will only fit into 
the q r similar boxes A,.. Also in the second distribution if the boxes 
B,, B 2 , ... B, be identical we cannot alter the distribution by any inter- 
change of a pair of the sets of q u q 2 , ... q, objects. Hence there is a 
one-to-one correspondence and we may state that the number of distribu- 
tions of objects of specification (p 1 p 2 ... p a ) into boxes of specification 
( ( h'h ■■■ It), the numbers q t , q 2 , ... q, being all different , one object being 
placed in each box, is equal to the number of distributions of objects of 
specification (/>, /a, . . . p,) into boxes of specification (t) such that the t 
boxes contain q lt q 2 , ... q, objects respectively. For example, compare 
these distributions where 

{PiPi ■■■Pm) — (321), (q.q, ... q t ) = (321), 

A t A i A i A 2 A 2 A 3 


or 

A A A 


a/3y 

a/3 

a 

a/3/3 

ay 

a 

day 

W 

a 

Wy 

aa 

a 

a a/3 

fiy 

a 

a/3y 

aa 

P 

aay 

a/3 

P 

aafi 

ay 

P 

aaa 

Py 

P 

aaa 

PP 

y 

a a/3 

a P 

y 

a/3/3 

aa 

y 


ABE EQUAL IN NUMBER 


45 


51. Again, if the numbers q u q. 2 , ... q t be identical and the boxes 
B u B„, ... B, liave the specification (1') we find that in the first dis- 
tribution the sets of q u q 2 , ... q, objects can be permuted in all possible 
ways so as to produce new distributions— the number of ways depending 
upon the similarities that may exist between the t sets of objects. Also 
in the second distribution, since the boxes are all different, the sets of 
objects can be permuted exactly as in the first distribution, and we 
may say that the number of distributions of objects of specification 
{PiPi j »,) into boxes of specification (q, q 3 -- q t ), the numbers q u q 3> ...q, 
being identical, one object being placed in each box, is equal to the 
number of distributions of objects of specification {p 3 p 2 ... p s ) into boxes 
of specification (1‘) such that the t boxes contain in some order 
?i> ••• qt objects respectively. As an example we may compare the 

distributions of objects of specification (321) into the boxes 

A x Aj A 2 A 2 A 3 A 3 and AiA 2 A 3 , 
where . . . />«) = (321), (qpp ... ?,) = (222). 


CHAPTER V 


DISTRIBUTIONS OF GIVEN SPECIFICATION 

52. In this chapter we examine the distribution theory that has 
just been before us with special reference to the specifications of the 
distributions. In a product-sum such as k 3 , for example 

(3)+(2i)+(n 

the occurrence of a part 1, 2, or 3 in a partition indicates that 1, 2, or 
3 similar parts have been placed in similar boxes and it was by restricting 
the magnitude of these parts to be not greater than k that we were able 
to determine the theory of the distribution when the condition was that 
not more than k similar objects were to be placed in similar boxes. In 
order to put in evidence the specifications of the distributions we con- 
sider in connexion with the product-sums h u h^, h 3 ,... the new functions 

y 2 = ^(2) + ^ 2 (i 2 ). 

X 3 = x 3 (S) + x 2 x l (21) + *,’(1’), 

X t = x t (4) + x 3 x, (31) + x? (2 s ) + x 2 x, 2 (21 s ) + xf (l 4 ). 

We may if we choose regard x u x. lt x 3t ... as being the elementary 
symmetric functions of a new set of elements 

P, y',—. 

Indicating symmetric functions of this set by dashed brackets, viz. 

( )', the relations may be written 

X = (1)'(1), 

Xa = (l 2 )' (2) + (2)'(1 2 ) + 2 (1 2 )'(1 2 ), 

X 3 = (l a )'(3) + 2 (l 3 )' (2 1 ) + (21)' (21) + 6 (l 3 )'(l 3 ) 

+ (3)'(l 3 ) + 2 (2l)'(l 3 ), 
etc. 

and we, at once, notice a symmetry in the right-hand sides of these 
relations. They are unaltered by an interchange of dashed and un- 
dashed brackets or in other words, by an interchange of the sets of 
quantities a, fi, y, ...a', /?', y', .... To prove that this symmetry is uni- 
versal consider the infinite series 

1 + X j + X 2 + A 3 + . . . , 


DISTRIBUTIONS OF GIVEN SPECIFICATION 


47 


which is expressible as the product 

(1 +®,a + x 2 o? + X 3 a? + ...) 

x (1 + a-,/3 + x 2 /3* + x 3 p 3 + ...) 

x (1 + x t y + x 3 y 3 +x 3 y 3 + ...) 

X ... , 

because the coefficient of x\x h x„... therein is 
S a* s(V-). 

Since x u ^2, « 3 , ... are the elementary functions of a', /S', y', ... 

(1 + X 3 a. + X 3 * 1 + x 3 a? + ...) = (1 + a'o) ( 1 + /S'a) ( 1 + y'a)..., 

so that also 

1 + A i + A 2 + X 3 + . . . 

= (i + a a) (1 + /S'a) (1 + y'a) ... 

X (l +a'/3)(l +/?'/?) (l+y'/J)... 

x(l+ a 'y)(l + /3'y) (!+■/•),)... 

x... ; 

a relation which establishes the symmetry for the right-hand side is 
unaltered by the interchange of dashed and undashed letters. 

53. We have to deal at present with the set of relations which 
commences with A", = x x (l). 

Taking any product of the functions AT, say for example A' 4 X 3 , we 
find that we can arrange the right-hand side according to products of 

quantities x lf x 2 , In particular, selecting the term which involves 

x 3 x 2 Xi, we have 

A 4 A 3 = ... +{(21 2 )(3) + (31)(21)}a- 3 ir 2 a- 1 a + .... 

The function 

(2 1 2 ) (3) + (31) (21) 

is associated with two partitions of the number 7; (43) which defines 
the AT product and (321 2 ) which defines the a- product. The numbers 
which appear in the two functions (21 2 )(3), (31) (21) are those which 
appear in the x product and moreover each function involves partitions 
of the numbers 4, 3 which appear in the X product. 

The symmetric function products (21 2 ) (3), (31) (21) are derived from 
the symmetric function (32 1-) by a process called ‘Separation’ and each 
is said to be a ‘Separation’ of (321 2 ). Each factor of such a product 
is said to be a Separate of the ‘Separation.’ The like terms are em- 
ployed when we are thinking only of Partitions. A partition is separated 
into separates just as a number is partitioned into parts. 


48 


DISTRIBUTIONS OF GIVEN SPECIFICATION 


Separation consists in separating combinations of parts by distinct 
brackets. Thus 

(321 s ), (321) (1), (31') (2), (32) (l 2 ), (21 s ) (3), (31) (21), (32)(l) s , 
(31) (2) (1), (3) (21) (1), (3) (2) (I s ), 

(3)(2)(l) s , 

are all separations alike of the function (32 I s ) and of the partition 
(321 s ). 

We may therefore say that the two terms of (21 s ) (3) + (31) (21) are, 
both, separations of the function (321 s ). 

A separation has a specification which consists of the series of 
numbers which denote the sums of the numbers in separate brackets 
or as we may say in the separates. Thus the eleven separations above 
written have specifications 

(7), (61), (52), (52), (43), (43), (51 s ), 

(421), (3 s 1), (32 s ), 

(321 s ). 

Hence the terms of (21 s ) (3) + (31) (21) may be fully described as 
being separations, of the partition (321 s ) which defines the x product, 
which have the specification (43) which defines the X product. The 
terms (21 s )(3), (31) (21) each appear above with the coefficient unity 
because in the associated X product no exponent exceeds unity. Had 
we chosen the product X, X 2 3 we would have obtained a term 
2 (3) (21) . 3 (2) s (I s ) x-^'x? 

such that (3) (21) (2) s (l s ) is a separation of (32 3 1 3 ) of specification (3 S 2 3 ) 
and the coefficient 3x2 that presents itself denotes that the separation 
(2) s (l s ), composed of separates of the same weight, has three permuta- 
tions ; and similarly that the separation (3) (21), also composed of 
separates of the same weight, has two permutations. 

We may say that in the X product the coefficient of a separation is 
equal to the number of permutations of the separates when only per- 
mutations between separates of the same weight are permitted. 

54. Take now the general X product 

X q , ■ ■ ■ - tPi„ x,,x ,, ... + .... 

We see that 

(i) I J is a linear function of separations of (a-,o-j<r 3 ...). 

(ii) Each separation that appears has the specification (q, q. ■ ■ ■ q,) 
and every such separation presents itself. 


DISTRIBUTIONS OF GIVEN SPECIFICATION 


49 


(iii) The numerical coefficient of a separation is equal to the 
number of permutations of its separates when only permu- 
tations between separates of the same weight are permitted. 
W e now expand Pina series of monomials so that 

P= ... + & (ptfi ■ ■ ■ p.) + ... 

and 

X qa . . . X qt = ... + 0(p 1 p 2 ...p s )x ai x <tt x c ,.... 

We gather that objects of specification 

(PlPi-'P.) 

can be distributed into boxes of specification 

(?i S'* •■•?«)> 


one object in each box, so that the distributions have, all, the speci- 
fication 


in 6 ways. 


(cqcr 2 <r 3 ... ) 


55. It has been seen in the foregoing chapter that we can inter- 
change the specifications of the objects and boxes without altering the 
specifications of the distributions or the number 6. Hence we have a 
law of algebraic symmetry indicated by the complementary formula 

JT P . X Pa ... X p , = ...+ 6 ( ?1 q, ...q t ) Xo , x. a x 03 . .. + ... . 

As an example we develop the term 

{(2 l a ) (3) + (31) (21)} 

which appears in the product X t X 3 , and we find 

X t X, = ... + {(52) + 3 (51 s ) + (43) + 2 (421) + 2 (3 a l) 

+ 2 (32 2 ) + 3 (32 1 2 ) } ,r 3 XiXi + ... 

and we interpret any particular term, say 3(51 2 ), by stating that 
objects of specification (5 1 2 ) can be distributed into boxes of specifica- 
tion (43), one object in each box, in such wise that the distribution has 
a specification (321 2 ) in 3 ways. These are in fact 

A A A A B B B 

a a a fi a a y 

a a a y a a J3 

a a ft y 


a a a 


50 


DISTRIBUTIONS OF GIVEN SPECIFICATION 


the specifications of the distributions being shewn by 


B B | A 

B 

AAA 

B B A 

B 

a a fi 

y 

a a a 

a “ ! y 

P 


B B B I A A j A I A 

a a a | a a \ fi • y 


56. To shew the reciprocity we calculate 

X f Xj‘ = ••• + 3 (43) x 3 x 2 x, 2 + 
and the distributions are 

A A AAA Ji C 

a a a (} fi a fi 

a a a fi fi fi a 

a a fi fi fi a a 

3 in number and each of specification (32 1 2 ). 

57. The symbol D m can be employed with good effect because if 
we operate upon 

AT,, . . • A 

with A, l) Ih . . . D p „ 

we obtain a linear function of x products which gives a complete speci- 
fication account of the distributions of the objects into the boxes. 

We proceed from the relation 

BgX~q — XgX^q— s , 

valid for all integer values of s and also when s = 0 if we put x 0 = 1 . 

To take an example consider objects and boxes of the specifications 
(2 2 1 3 ), (43) respectively and recall the way in which the symbol D m 
operates upon a product through the compositions of its suffix. The 
calculation is 

A 2 A 3 AT 4 Ar 3 

= AA’(*iTjAj+ x 2 A" 4 Aj + x?X 3 X^) 

= A 3 { (x 2 + x, 2 ) (x 2 A 3 + x 2 X 2 Aj + av'A’jAj) 

+ x 2 (x 2 Aj, A> + Xj 2 X 3 ) } 

= A 3 {(x 2 2 + 2x 2 x, 2 ) AT, + (2x 2 2 + 2x,xf + x , 4 ) AjjAj! 

= A 2 { (x 2 2 x, + 2x i x 1 3 ) X -2 + (2x 2 2 X! + 2x 2 Xi 3 + x, 5 ) X 2 

+ (2x 2 2 x, + 2x 2 x, 3 + x, 5 ) A', 2 } 

= A 2 { (Sx.fx, + 4x 2 X! 3 + x, 5 ) X, + (2x 2 2 x, + 2x 2 x, 3 + x, 5 ) Aj 2 } 

= A (7x 2 2 x, 2 + 8x 2 X] 4 + 3X,") A', 

= 7x s 2 x, s + 8 x 2 X! E + 3x, 7 , 


DISTRIBUTIONS OF GIVEN SPECIFICATION 


51 


and the distributions indicated are 


Spec. (2 2 1 3 ) 

(2 1 5 ) 

(l 7 ) 


A A A A BBS 

A A A A B B B 

A A A A 

BBS 

a a fi fi y 8 € 

a a fi y fi 8 e 

a fi y 8 

a fi e 

a a y 8 fi fi e 

a a fi 8 /3 y € 

a fi y e 

a fi 8 

a a y e fi fi 8 

a a 8 t fi fi y 

y 8 fi fi a a e 

y e fi fi a a 8 

8 e fi fi a a y 

a a fi e fi y 8 
a y fi fi a 8 € 

a 8 fi fi aye 

a e fi fi ay 8 

a y 8 e a fi fi 

fi y 8 e a a fi 

a /3 8 c 

a fi y 

No. 7 

8 

3 


58. It will be observed that, since D,X q 
of highest degree obtained from 

th.6 

x product 


Zip, . . . A. AT,, X q , — 

A, 



raust be x f l x Pt--- a! Pf 

Again from the symmetry on the right-hand sides of the relations 

X 3 =x 3 (i) + #i 8 (i s ), 

X 3 = x 3 (fi) + x 3 x 1 (21) + a?, s (l 3 ), 

which was established in Art. 52, we may derive from the relation 
Hi -Ag = x 3 ATg—j 

the relation 

where the symbol D has reference to the symmetric functions of the 
quantities <*, fi, y,... and as before a,, a 2 , a 3 , ... are the elementary 
functions of the quantities a, fi, y,.... 

This is so because an interchange of the sets 

a,fi,y,..., d,fi',y',... 

leaves X q and X q ~, unaltered while changing D, into A' and x, into a s . 
Similarly from the result 

A 2 A 3 X t X, = 7x?x? + 8x?x 3 + 3X, 7 , 

we derive 

A 2 A 3 X t X 3 ■= 7a? a} + 8a? a 2 + 3a, 7 . 

These transformations are of much service in the development of the 
algebra. 


52 


DISTRIBUTIONS OF GIVEN SPECIFICATION 


59. In Art. 55 we have determined the specifications of the distri- 
butions when we are given the specifications of the objects and boxes. W e 
can obtain all the distributions which have a given specification, the 
specifications of the objects and boxes being at disposal by simply ex- 
panding an X product as a linear function of x products. Thus since 
X} = a- 2 2 { (4) + 2 (2 2 ) } + x t x? {2 (31) + 2 (21 2 )} 

+ xf {(2 2 ) + 2 (21 2 ) + 6 (l 4 )} 

we gather that a distribution of specification (2 2 ) can be obtained, 
when the box specification is (2 2 ), by distributing objects of specification 
(4) in one way, and objects of specification (2 2 ) in two ways; and simi- 
larly the other two terms upon the right-hand side can be interpreted. 

The distributions are 


( 2 ! ) 

A A B B 

(4) a a a a 

(<)i\ a a ft ft 

^ ' ft ft a a 


( 2 1 2 ) 

A A B B 


(31) 

( 2 1 2 ) 


a £ 


a 

a a 


ft y 


ft y 


( 2 2 ) 
( 2 1 2 ) 


( 1 ‘) 


( 1 ‘) 


A 

A 

B B 

a 

ft 

a 

ft 

a 

ft 

a 

y 

a 

y 

a 

ft 

a 

ft 

y 

s 

a 

y 

ft 

s 

a 

s 

ft 

y 

ft 

y 

a 

s 

ft 

s 

a 

y 

y 

s 

a 

ft 


60. In Art. 48 we shewed that the theory of a certain distribution 
led easily to the enumeration of certain numbered diagrams which could 
be accurately defined. The correspondence was obtained by an examina- 
tion of the way in which the operation of the symbol D m is effective 
in obtaining the enumerating number. Looking back to Art. 57 we can 
similarly examine the calculation involved in the expression 

D.?D*X t X,. 

The symbol D m is performed through the medium of the compositions 
of the number m. If 

(*i , C 2 , . . . Ct 

be such a composition we may have to perform the symbols 

D C] , D„, D c „...D ct 

upon the several factors of the X product. Now since (Art. 57) 

B g — Xg ACq—g 


DISTRIBUTIONS OF GIVEN SPECIFICATION 


53 


we see that, associated with the particular portion of the operation, we 
will have a product 

Xcx • ■ • QCct 

with coefficient unity, and not merely unity as is the case when we 
are dealing with the functions h u /i 2 , 

Again operating as in Art. 48 through another composition d u d 2 ,...d, 
we obtain another x factor 


*^di •^’d 2 ••• %di 

the coefficient being again unity. 

Finally we arrive at a certain x product with the coefficient unity 
and we find that corresponding to one of the distributions we have a 
lettered diagram 


Xc, 

X c , 

X o 


•Ect 

x d , 

x di 

Xd , 


Xd, 

X', 

x„ 

Xe, 


X t 





X,, 

X a 



X.< 


and the product of these st, x factors defines the specification of the 
particular distribution which has led us to this diagram. Eliminating 
the symbol x from the diagram, as it has no numerical significance, we 
write it 



If the distributions considered have, all of them, the same specifica- 
tion defined by the x product resulting from the diagram, it is clear 
that the diagrams must all give the product x ai x„ t x a: ..., where the 


54 


DISTRIBUTIONS OF GIVEN SPECIFICATION 


partition (<7^^,...) is the specification of the distributions. Hence 
the numbers c lt c a , ... d,, d 2 , ... s x , s 2 , ... in the numbered diagrams 
must, when assembled, make up the partition (o-jo-jo- 3 

We may therefore state the theorem : — 

“Let there be a rectangular chess-board of t columns and s rows and 
let given numbers cr,, cr 2 , ... <r T , t in number, be placed in the compart- 
ments in such wise that the sums of the numbers in the successive rows 
are pi, p 2 , ■■■]>, and in the successive columns q lt q 2 , then the 
number of such numbered chess-boards is equal to the number of ways 
of distributing objects of specification (jhp 2 ■■■ p,) into boxes of specifica- 
tion (q, q 2 ... q t ) subject to the condition that the distributions are to 
have the specification (<r,o- 2 ... a-,).’’ 

61. As an example, by means of the result 

D^DfXiX-i = lx,?,? + Sx, l x 2 + 3tf, 7 , 

we enumerate the 7 diagrams 



in which the compartment numbers are 2, 2, 1, 1, 1 derived from the 
specification (2 2 1 3 ) of the corresponding distributions; the row sums are 
2, 2, 1, 1, 1 derived from the specification (2 a l 3 ) of the objects dis- 
tributed, and the column sums are 4, 3 derived from the specification 
(43) of the boxes. 

Also we enumerate the 8 diagrams 



DISTRIBUTIONS OF GIVEN SPECIFICATION 


55 


associated with the compartment numbers 2, 1, 1, 1, 1, 1, and the 3 
diagrams 


1 

1 

1 

1 

1 


1 



] 




1 

1 

1 

1 

1 



1 

1 



associated with the compartment numbers 1, 1, 1, 1, 1, 1, 1, the row 
and column sums being, as before, derived from the partitions (2 2 1 3 ), 
(43). (Compare Art. 49.) 


CHAPTER VI 

THE MOST GENERAL CASE OF DISTRIBUTION 

62. So far two main divisions of the Theory of Distribution have 
been under consideration, viz. the case in which there are no similari- 
ties between the boxes and the case in which the number of boxes is 
equal to the number of objects. In the former the objects may be of 
any specification ; in the latter both the objects and the boxes may be 
of any specification. The next main division that presents itself for 
examination is concerned with boxes which may be any in number but 
in every case indistinguishable from one another. They have the 
specification ( m ) when they are m in number. The objects may be any 
in number and of any specification. No box is supposed to be left 
empty so that the objects are at least as numerous as the boxes. 

Objects of the specification (pij» 2 .../>,) are in correspondence with 
the assemblage of letters 

a , P| a 2 p ! . . . a/« or a p t/3 p ‘ ... (T p » or aP/Fy'' 

63. The partition (pip-^.-.p,) may, from another standpoint, be 
regarded as a multipartite number or, in other words, as a succession 
of numbers which enumerate letters or objects of different kinds. 

If we separate any combination of letters from the assemblage 

•W-, 

say a Pi y r i ... , 

the numbers p lt q u i\, ... are not necessarily or generally in descending 
order of magnitude and some of them may be zeros. If we break up 
the assemblage into m portions 

oP'fiP iy r ‘ . . . j y’s ... ( ... aPmf$lmy r m , . . } 

without any regard to the order of writing the portions, we may speak 
of a distribution of objects of specification (pqr ...) into boxes of speci- 
fication (m) because no permutation of the boxes, which are all similar, 
alters the distribution. In correspondence we speak of partitioning the 
multipartite number into m multipartite parts and we denote such 
partition by the notation 

(Pi<h r i ■ ■ • , P-^r, ..., ... p m q m r m ...). 

The parts may be placed in any order without affecting the partition. 


THE MOST GENERAL CASE OF DISTRIBUTION 57 

Thence it arises that the problem of distribution into similar boxes 
is identical with that of partitioning a multipartite number. 

It will be remarked that a collection of integers in a bracket may 
denote either a partition of an ordinary or unipartite number or a multi- 
partite number, but that whereas the parts of the partition in the 
former case may always be written in descending order, such is not the 
case with the constituents of the multipartite parts of a multipartite 
number. 

As a simple example of the correspondence between distribution 
and partition, take the assemblage a 2 /? 2 . 


Distribution of a 2 / 3 2 into 
two similar boxes 

A A 

a/3 2 a 

a 2 /J 2 

a/3 a/3 


Partitions of (22) into 
two parts 

( 21 , 01 ) 

( 12 , 10 ) 

( 20 , 02 ) 

( 11 , 11 ) 


64. In the main divisions previously discussed we have had to 

deal with the homogeneous product-sums of the elements a , /3, y, 

In the present main division we have also to deal with homogeneous 
product-sums, not of the simple elements but of certain combinations 
ot them, Mj, w 2 , tl 3 , 

A reference to Art. 8 shews that we can arrive at the product-sums 
by first obtaining the power-sums. 

Thus if Ui + u} + u 3 + ... = <r k , 

and U x , U 2 , U 3 , ... denote the product-sums, 

Ui = cr„ 

2 ! U 3 = <r, 2 + (To, 

3! U 3 = ay* + 3 o-,(T s + 2<r 3 , 

4 ! U, — o', 4 + 6 a , 2 (j o + 3<T 2 2 + 8 ct,<t 3 + 6<r 4 , 


m ! U m = % 


mi A^V"Y— Y" a 

miim 2 im 3 i ... \ 1 / \2/ \3/ 


We have to determine the particular combinations of a, /?, y, ... 
that we may substitute for u lt u.,, u 3 , ..., so as to be of service in the 
problem before us. 


58 THE MOST GENERAL CASE OF DISTRIBUTION 

If we take m = 1, so that there is but a single box, we note that for 
any assemblage of objects 

oPpof ... 

there is only one distribution; the whole of the objects must be placed 
in the only box. Hence the symmetric-function enumerating function 
must be the sum of all the monomial functions of all weights. We may 
take it to be 

A, + A 2 + A) + ... ad inf. 

= (1) + (2) + (l 2 ) + (3) + (21) + (I s ) + ..., 

because the coefficient of the function ( pqr ...) in the series of func- 
tions is unity. 

When m = 2, we may place in the two boxes any two assemblages 
which, added together, make the assemblage to be distributed. Regarding 

a p f$ q y r . . . 

as a literal product, we have two products whose product is equal to 
the given product. Now it is evident that if P,, P 2 be two such pro- 
ducts, the distribution must be of one of the types 

A A A A 

P.P. PJ\ 

so that the product distributed must be either P? or P,l\, where 
Pi, P 2 separately may be any combination of letters. Hence every 
possible distribution will be realised for all specifications of the objects 
to be distributed by taking the product-sums of order two of all com- 
binations of letters. The enumerating function must therefore be the 
sum of such product-sums of all weights. 

Similarly when vi = 3 , the distribution must be of one of the types 

AAA AAA AAA 
P\ P \ Pi PJ\P, PJ\P, 

so that the product to be distributed must be either P, 3 or P 2 P.. or 
Pi P2P3 Hence the enumerating function must be the sum of product- 
sums of order three of all combinations of letters. 

By similar reasoning for m boxes the enumerating function must be 
the sum of product-sums of order m of all combinations of letters. 
These combinations are the terms of the infinite series 

Aj + ^ + h 3 + . . . ad. inf. 


THE MOST GENERAL CASE OF DISTRIBUTION 


59 


65. If we proceed now from these combinations we will obtain a 
solution of the problem, but it is much better to include unity in the 
series of terms. If unity may be placed in any box instead of one of 
the above combinations it is clear that we will enumerate the distribu- 
tions into m or fewer boxes, and this will be quite satisfactory because 
we have only to subtract the function which enumerates the distribu- 
tions into m - 1 or fewer boxes in order to obtain the function which 
enumerates the distributions into m boxes, no box being empty. As the 
algebra is easier we adopt this course and put 

Si — 1 + hi + h 3 + k 3 + ... 

= 1 + (f ) + (2) + (l 2 ) + (3) + (21) + (l 3 ) + — 

The sum of the £th powers S k , of all the terms appearing herein, is 
obtained, as is readily realised, by multiplying every part which appears 
in the partitions by k. 

Hence 

<Sj = l + (2) + (4) + (2 2 ) + (6) + (42) + (2 3 ) + . . . , 

£3 = 1 + (3) + (6) + (3 ! ) + (9) + (63) + (3 3 ) + . . . , 


S k = 1 + (k) + (2 k) + (Xr*) + (3k) + (2k, k) + (P) + . . ., 
and thence if £7,, U 2 , U 3 , ... be the product-sums, 

Ui = S u 

2\U t = S l ' + S t , 

3\ U 3 = Si 3 + 3Si S 2 + 2 S 3 , 

4 ! U t = S* + 6 S*St +3SS + 8 Si S 3 + 6 S , , 


m\U m = % 


m\ 

rrii ! m 3 ! m 3 ! . . . 



This is the expression of the enumerating function U m in which the 
coefficient of the function (pqr...) is to be taken. 


66. If, on development, we find that 

U m = ... +6(pqr...)+ ..., 
our operating symbols shew us that 

D„ l) q D r .. U m = 6[) p I), l D r ... (pqr ■■■) = &, 
since no other terms on the right-hand side survive the operations.. We 
must therefore learn how to operate with the D symbol. It will be 
remembered that the symbol D m causes every symmetric function, 
whose partition does not involve the number m, to vanish, and that 


60 


THE MOST GENERAL CASE OF DISTRIBUTION 


when the number m does appear it strikes out that number from the 
partition once. Now the portion of $,, that involves m in partitions, is 
( m ) + (ml) + (m2) + (ml 2 ) + (m3) + (m21) + (ml 3 ) + . . . ad. inf. 

Hence D m S 1 = S u 

or every operative symbol leaves S , unaltered. 

Also -ZAm S% = ^2 , -Zt 2m 41 — 0, 

since S 2 does not involve any uneven number. 

Generally D im Si = S t and D,8f = 0 unless s is a multiple of i. 

The effect of D m upon Sfi comes next for consideration. The symbol 
operates through the compositions of m into k, parts, zero being included 
as a possible part. 

Thus for example, omitting the operator A for convenience, replac- 
ing /)„$! by its value S u 

A#, 3 = A A, . /Si . 8 1 + Si . A«, . S t + S, . S, . AS, 

+ AS, . AS, . Si + AS, . Si . AS, + Si . AS, . AS, 

+ AS, . AS, . S, + AS, . Si . AS, + Si . AS, . AS, 

+ AS, . AS, . AS, 

= 10 A, 3 , 


and this coefficient of (S', 3 arises because the number of compositions of 
the number 3 into 3 parts is 10. 

This example establishes that the effect of D m upon S,*> is to 
multiply S,*' by a number equal to the number of compositions of m 
into parts, zero counting as a part. Hence by Art. 30 


A ,S, 


*, 


r 


67. The effect of D m upon depends upon the compositions of 
m into k,j even parts, zero taking its place as an even part. Hence 
unless m be even it causes to vanish. Considering then the symbol 
A,« w e observe that the compositions of 2m into even parts are equal 
in number to the whole number of compositions of m, for they are 
obtainable by multiplying by 2 each part of the latter composition. 

Hence Am &*■ = ( m + ^~ l ) . 

\ m J 


Generally there is no difficulty in establishing that 

A,«‘ , = ( W+ i" 1 )s i li 

while the result is zero if the suffix of the operative symbol is not a 
multiple of i. 


THE MOST GENERAL CASE OF DISTRIBUTION 


61 


68. Finally, consider the value of 


D m S*>Sf* ... Sfi. 


In dealing with the compositions of m into k x + + ... + k t parts, 

zero counting as a part and retaining the factors of the operand in the 
above order, there is no condition that must be fulfilled by the first 
h parts of the composition; the next k , parts must be multiples of 2; 
the next k 3 parts must be multiples of 3; ..., and finally, the last 
ki parts must be multiples of i. Unless these conditions are satisfied 
the result of the operation derived from the composition will be zero. 
The complete result of the operation is the multiplication of the operand 
by an integer equal to the number of the compositions of m which have 
the properties above set forth. This integer is equal to the coefficient 
of x m in the development of the algebraic product 

(1 + x + a? + ...)*i(l + # 2 +.r 1 + ...)*3 ... (] +x i +af i + ...)*>, 

because in the ordered multiplication an exponent of x can only be 
made up of 

U numbers each divisible by unity 

k i » „ „ „ 2 

^3 >» 1J )j ,, 3 

k i >) j; »> ,, l 


Hence 


DmSfiSfi ... Sfi 


= coefficient of x m in 


(1 -*)-*>(! ... (1 -a*) - k tSpSp... St 


69. In the light of this result consider the enumeration of the 
distributions of objects of specification ( pqr ...) into two or fewer 
similar boxes, or, what is the same question, the enumeration of the 
partitions of the multipartite number ( pqr ...) into two or fewer parts. 
By Art. 62 we seek the coefficient of the function {pqr...) in the 
development of 

U,= ~(St + SJ. 

This is equal to the first term in 

D p D, I D r ...±- l (St + S>), 

which materialises when, after the operations, we put each of the 
quantities S lt S 2 equal to unity. 


62 


THE MOST GENERAL CASE OF DISTRIBUTION 


Now by the theorem that has been established in Art. 65 

^ (** + $) 

is equal to the coefficient of xf' in 

1/ si S 2 1 _1 (fl» + /%) + *.(&«-£) 
i-xj 2 ! 2 (i -#i)(i -x, 2 ) 

Hence the coefficient of the function (p,) is, putting S, = S 2 = 1 , the 
coefficient of xf' in 

1 

(1 -*0(1 -x ?) ' 

This number enumerates the partitions of the (unipartite) number p l 
into two or fewer parts and solves the corresponding problem in distri- 
butions. 

This result is of course well known since the time of Euler. 

70. Proceeding from the result 
D p i \ (S\ + S a ) = coeff. of «,” 1 in 

If Sf S 2 ) 

2 1(1 - Xif 1 - x i 2 J 

we can further operate with the symbol D r „ and find that 

D p i D V 1 \ [S? + A,) = coeff. of xf'xf 1 in 

1 f _ SS 8 , 1 

2 1(1 - x,y (i - x 2 y + (i - x?) (i - x/)l ’ 

shewing us that the coeff. of the function ( p t p 2 ) is equal to the coeff. 
of xf' xf* in 

2 {(1 — ^i) 2 (l - x 2 y + (1 - xf) (1 - xj)} 

1 + X 2 X 2 

~ (i - ^i) (i - ^i 2 ) . (I - x,) C 1 - ,t 2 2 ) ■ 

This number enumerates the partitions of the bipartite number (pip 2 ) 
into two or fewer parts and solves the corresponding problem in distri- 
butions. 


THE MOST GENERAL CASE OF DISTRIBUTION 


63 


Further, if we denote by P(pq, 2), P(p, 2) the numbers of the 
partitions of ( pq) and ip) into two or fewer parts we see that we may 
write 

P(PiPi, 2 ) = P(p u 2 )P(p„ 2) + P(j» 1 -l, 2)P(p,-l, 2), 
a convenient formula. As an example 

P (33, 2)={-P(». 2)} a + {P (2, 2)} a , 

and observing that the numbers 3, 2 have each of them 2 partitions 
into 2 or fewer parts 

P (33, 2) = 2 a + 2 a = 8. 

The 8 partitions, thus enumerated, are 

(33), (32, 01), (23, 10), (31, 02), 

(13, 20), (22, 11), (21, 12), (30, 03). 

In general, since P (2/>, 2) = p+ 1 = P(2p + 1, 2), we have the 
formulae 

P (2p, , 2 p 2 , 2) = (p, + 1) (/>„ + 1 )+p i p. i , 

P (2/>i , 2 pi + 1, 2) = (2pi + 1) (j»s + 1), 

P(?Pi+ 1, 2/> a + 1 , 2) = 2 (p, + 1)(^,+ 1). 

71. For the multipartite number {p x pi . . . p,) we find that 
D Pl D Pt ... D Vt \ (A, 2 + 8,) = coeff. of xf'xf* ... x, p > in 

1/ ^ , S» 1 

2 1(1 -®,) , (1-*0*...(1 ~x t y (1-0(1 -O--0 -*,•)/ ’ 

and thence we establish that the partitions of the multipartite number 
( Pi Pi - Pi), into two or fewer parts, are enumerated by the coeff. of 
0*0* ... x a p > in 

2 {(l-^) a (l-^) a .--(l-^) 2 + (1 - a?, 1 ) (1 -^)...( 1 -*.*)} ’ 

1 + ^.XjX .2 + Sx 1 x 2 x 3 x i + ... 

(1 — ari) (1 -«!*). (1 - x 2 ) (1 - x?) ... (1 -x,)(l - x , a ) ’ 

the last numerator term being %x x x 2 ... x a _ x or %x x x 2 ... x a , according 
as s is uneven or even. 

From this result general formulae may be constructed as above for 
the particular case s = 2. 


64 


THE MOST GENERAL CASE OF DISTRIBUTION 


72. Passing to the partitions into three or fewer parts we have 
U 3 - 1 (S,* + 3UA + 2$), 

D Pl U 3 equal to the coelf. of xf' in 


1 f W n \ 

6 1(1 -x,) 3 + (1 — (1 — .rf 2 ) 1 x 3 j 


and thence the coeff. of the function (jo,) in U 3 is equal to the coeff. 
of •r,’’ 1 in 


1 


1 


r + 3- 


1 


— r,\ + 2 : 


i 


6 1(1 — ^i) 3 (1 — .rj) ( 1 -x?) "l-x, 3 )’ 

1 

° rm ( 1 - ■'■,) ( 1 - .?’,■) (1 - .r, 3 ) ’ 

the well-known result in the case of the unipartite numbers. 

Similarly for the partitions of bipartite numbers we are led to the 
coeff. of xf'x?’ 1 in 

1 f 1 1 

6 1(1 -x^f. (1 -x 2 f ' (1 -jt)(1-^ 2 ).(1-^)(1-^ 2 ) 

+ 2 ( 1 - Xi 3 ).(l~Xi)S ’ 

1 + X x X 2 + X?X -2 + XjXt + x 2 x 2 + x 3 x 2 

01 111 ( 1 - x,) ( 1 - .r, 2 ) ( 1 - .r, 3 ) . ( 1 - x 2 ) ( 1 - X, 2 ) ( 1 - x. 2 3 ) 


73. In general for the case of the partitions of .s-partite numbers 
into three or fewer parts we are led to the coeff. of x-fixf * ... x s p » in 


6 1(1 - Xj ) 3 (1 — x 2 ) 3 ... (1 - x ,) 3 

1 

+ 3 ( 1 - *,) ( 1 - X ; 2 ) . ( 1 - x.J ( 1 - xJ) . . . ( 1 - X.) ( 1 - X 2 ) 


1 

1 

(1 - X 3 ) (1 —x 2 ) 

... (1 -.r s 3 )J 


In a similar manner the enumerating generating function for the 
partitions of the multipartite numbers (piP-i ■ ■■Pi) into m or fewer 
parts can be constructed and the general problem of distribution before 
us may be regarded as solved. 


74. For the partitions of the unipartite number p t into m or fewer 
parts the generating function comes out, after simplification, in the 
Eulerian form 

1 

(1 (I-*")’ 


THE MOST GENERAL CASE OF DISTRIBUTION 


65 


75. The final and most general case of distribution presents itself 
when the objects have the specification (p x p 3 ...p a ) and the boxes the 
specification ( m 3 m. , ... m t ), the whole number of the boxes being any 
number not greater than the whole number of the objects. 

Here the enumerating generating function is 
| . . . U , 

in which we seek the coeff. of the function 

( Pi P-2 ■■■p,). 


For consider any distribution of the objects into the boxes. It 
consists of objects having a certain specification distributed into boxes 
of specification (m,), together with objects of other specifications distri- 
buted into boxes of specifications {m 3 ), (m 3 ), . . . (m,) respectively. The 
aggregate of these specifications of combinations of objects constitutes 
a composition of the multipartite numbers (pip t ...p,) into t or fewer 
parts, multipartite parts consisting wholly of zeros being admissible. 
Since any combination of objects may appertain to any set and the sets 
are not interchangeable, we obtain the generating function by simply 
multiplying together the generating functions which belong to the 
separate sets of boxes. 

76. The application of this theorem to the distribution of objects 
of specification ( p ,) is interesting. The enumerating function is 


1 


1 

.. (1 

(1-«)(1 -*■).. 

1 

.. (i -or*) 


. (!-«”>) 

1 

1* 

1 

1 

. (1 -z m ‘) 


where the succession of numbers » 1; » 2 , n 3 , ... is related to the suc- 
cession m lt m 2 , m 3 , ... m t in the following manner. 

We write down m u m 2 , m 3 , ... m, units in succession 


1 

1 

1 

1 . 

. m 1 

units 

1 

1 

1 

1 . 

■ 7W 2 


1 

1 

1 

1 . 

• m .\ 

>) 

! 

I 

1 

1 . 

• wit 

>> 


66 


THE MOST GENERAL CASE OF DISTRIBUTION 


the numbers m u m 2 , m 3 ... being assumed in descending order of mag- 
nitude and then add by columns producing a partition («!, n 2 , n s , ...) 
which is said to be conjugate to {m x m 3 ... m t ). 

We have therefore a remarkable theorem : — 

“The number of distributions of objects of specification ( p ) into 
boxes of specification (m l m 2 ... m,) is given by the coeff. of x p in the 
function 

(i - (i - # a ) -n a (i - %*)-*> . . . 
where (w]» 2 w 3 ...) is the partition conjugate to (rn^m^ ... m,).” 

77. As a verification observe that if ( m 3 m 3 ... m t ) = (m) the con- 
jugate partition is (l m ) and the enumerating function is 
(l-^)-'(l - a; 2 )- 1 ... 

whereas if (m 3 m 3 ... m t ) = (1“) the conjugate partition is (m) and the 
enumerating function is 

(1 -x)- m . 

As another example suppose (m,m s ... m,) = (221); the conjugate 
partition is (32) and the enumerating function is 

(1 -tf)- s (l -«*)"’, 

which is 1 + 3x + Sx 2 + 16^ + 30 of + . . . . 

The distributions of the assemblages a 2 , a 3 are 


A A B B C 


AABBC 


• • a • 

• • • • 

a a • • 

a • a • 

a • • • 

• • a a 

• • a • 


a a 
a a 


No. = 8 


• a a 

No. = 16 


THE MOST GENERAL CASE OF DISTRIBUTION 


67 


78. If we restrict the symmetric functions utilised so that no part 
greater than k appears the effect is to restrict the distributions to the 
extent that not more than k similar objects can appear in any one box. 
We may usefully examine the case k = 1. 

Instead of the functions S u S 3 , ... we take 

A, = l + (1) + (1 2 ) + (1 3 ) + 

A, = l + (2) + (2 2 ) + (2 3 ) + ... , 

A m =l + (to) + (to 2 ) + (to 3 ) + ...; 
and then Z7, = A u 

2! U 2 = A? + A it 

3! f/ 3 = ^ l 3 + 3A,A a + 2A :1 , 


to! U m = 'S, 


to ! 

to , ! to 2 ! to 3 ! 


(tntntr 


79. For the operation of the D symbol we have 

Z>, A, = A u D 0 A , = A lt D m A Y = 0 in other cases, 
and generally D„A m = A m when s = to or zero, 

D,A m = 0 in every other case. 

Also the symbol, operating through the composition of its suffix into 
units, yields 

D k A l m = ( m k ) Ar-, ^,’ = ( 3 ^;. 

For the operand 

A^'A^ ... Ai m ‘ 

the symbol D, operates through the compositions of s into 
to, + to 2 + ... + nti parts, 

zero counting as a part. From the law of operation given above it can 
be seen that for the operation associated with such a composition to 
have an effect other than zero, 

the first to, parts must be zero or unity, 
next to 2 ,, ,, two 

„ m, „ „ three 

• • >) >1 \ 

>> wii ,, ,, i 


68 


THE MOST GENERAL CASE OF DISTRIBUTION 


The number of such compositions is the coefficient of x“ in the product 
(1 + x ) m ■ (1 + x 2 ) m ’ ( 1 + . . . (1 + afyn 

as is evident when the orderly multiplication is carried out (cf. Art. 1 4). 
Thence 


D.A^A^ ... A i™* 

= A™' A™* ... A™* x coefficient of x’ in (1 + x) m '(l + x *) m * ... (l + x i ) mi *. 


80. To apply this result, consider the distributing of objects of 
specification (2*«l*i) into two or fewer similar boxes— in other words, 
the partitions of the multipartite number (2*»l*i) into two or fewer 
parts subject to the restriction that no box is to contain two similar 
objects — or no constituent of the multipartite parts to involve numbers 
greater than unity. 

We find that 


D 2 U t - D 2 (.^A 2 + £ A 2 ) = \A 2 + kA 2t 

because the coefficients of x 2 in (1 + x)‘ and in (1 + x 2 ) are both unity. 
Hence 

d**u 2 = \a?+\a 2 . 

Now D 1 U i =D l (^Af + = A?, 

because the coefficients of x in (1 +xf and in (1 + x 2 ) are 2 and zero 
respectively. 

Hence by repeated operation 

d^d^u 2 =^a;\ 

establishing that the coefficient of the function (2*» l*i) in U 2 is 

2*!- 1 . 


Ex. gr. Suppose that the objects for distribution are 
aa/3/3yySc6, 
so that k 2 = = 3. 

The distributions — four in number — are 
A A 


a/Sy&cd a/3y 

a/3yS( afiyQ 

afiytS a-flye 

afiyeO a/2y& 

* The reader will observe that when the magnitude of the parts of the partitions is 
restricted to be not greater than the integer k, the corresponding function of x is 
(l + x + * 2 + ...+a4) m > (1 + x 2 + x i + ... + x 2k ) m n ... (1 + ®* + a: 2 *+ ... +x’") m i 

_/l-X k+1 \ m l n-x a + 2 \ m , / I - xik+i^m; 

S V i-* ) V i -» 2 ) ■ \ i- x < ) 


THE MOST GENERAL CASE OF DISTRIBUTION 


69 


Again, let the objects be of specification (?> kn '2 kj l* 1 ) and let there be 
three or fewer similar boxes, the distributions being subject to the same 
restriction as before. 

We find that 

Af(^i 3 + 3A 1 A 2 + 2A 3 ) = ^(A 1 s + 3A,A 2 + 2 A 3 ), 
because the coefficients of X s in 

(l+a:) 3 , (1 + x) (1 + x 2 ), (1+ar*) 
are all equal to unity. 

Hence A* 3 gives £ (A, 3 + 3A 3 A 2 + 2-4,). 

Now the coefficients of x 2 in the three functions of x are 3, 1 and 0. 
Hence 1 ) 2 g- (.4 i J + 3.4 , .4 2 + 2.4 3) = ^ (.4 1 3 + .4 , .4 2 ) 

and A* 1 A* 1 J (^i* + 34idii+ 2dj) = | (S^- 1 4, 3 + A t A 2 ), 
and finally 

AW A* 1 h (^l 3 + 3AjA 2 + 2.4,) = I (3 fc 2+*!-> ^ + A^Ai), 
establishing that the coefficient of the function (3^ 2*» 1*») in U„ is 

i (3 k * +k '-'+ 1). 

Ex. gr. If the objects to be distributed are 

aaa yy 88 «e 8, 

so that k 3 = 2, k 2 = 3, A, = 1 

we have the fourteen distributions 


A 

A 

A 

A 

A 

A 

i/3y 8(0 

a/3 ySe 

a/8 

a/3y8c 

a/3y8t 

a/38 

i/3y8e0 

a(38e 

a Py 

afiyfd 

a/38c 

a/3y8 

ifSySeO 

a/3ye 

a/38 

a/3y8c 

a/388 

aftyc 

iffy 8e0 

a/3yS 

a/3t 

a/3ySc 

a/38 € 

a/3yd 

i/3y8e 

a /38e6 

a/?y 

a /38e6 

a(3y8 

a/3y« 

iflytO 

aftySe 

a/38 

a(3y86 

a/3yt 

a/38* 

i(3y8t 

a/3y80 

a/3t 

a/3y8e 

a/3y8 

afiid 


81. In general to shew the nature of the theorems more clearly we 
observe that 

A, Ai A« A = coefficient of Xi Pl x 2 p * ... x/‘ in 
\ {(1 +*i)(l +x *) (1 +tf„)} 2 A, 2 

+ 1 (1.+ a?,*) (1 + x 2 2 ) ... (1 + x?) At, 


70 


THE MOST GENERAL CASE OF DISTRIBUTION 


so that the enumerating function is 


h {(1 +«i)(l + *.) (1 + <r,)}* 

+ £ (1 + %i) (1 + x?) . . . (1 + x?), 


and similarly, derived from 

D n D Pt ...n„U„ 

we obtain the enumerating function 


f{(1 + *i)(1+*i) — (1 +«,)}’ 
+ i iC 1 + «i)(l +«a) (1 +X.) 


x(l+* i *)(l+* t )...(l + *,*)} 
+ J(l+^, 3 )(1 +x?) ... (1 +#/), 


and so on in the higher cases. 

In conclusion it will be clear that an important part of the Theory 
of Combinations and Permutations is intimately connected with the 
Theory of Symmetric Functions in elementary algebra. 

In Combinatory Analysis, by the author, the correspondence is carried 
much further and it is shewn that either theory is a powerful instrument 
of research in the other. The fact is that in theorems of Combinations 
and Permutations the entities dealt with come into consideration in a 
symmetrical manner and a symmetrical method of investigation is at once 
suggested. Moreover it will be found in nearly every case that the 
appropriate method, though it may be in appearance devoid of symmetry, 
is when sufficiently examined, symmetrical. The binomial coefficients 
which enter largely into combinatory theorems are themselves symmetric 
functions of zero weight. Ex. gr. 




2a°/3° 


2a»/3y = (00 = g) 


the number of the quantities a , f3, y, ... being n. 

There is an algebra of these numerical functions which deals with 
their representation by means of partitions with zero parts and we find 
an appropriate operator 


A 


which operates through the compositions of zero into zero parts. These 


THE MOST GENERAL CASE OF DISTRIBUTION 


71 


are identical with the partitions of zero into zero parts and are infinite 
in number, viz. : — 

0, 00, 000, 0000, ad inf. 

It has been noted that the operator 

D m 

where m is a positive integer > 0 is in fact a partial differencial operator 
of the order m. When m = 0, we find that the operator is one that is 
met with in the Calculus of Finite Differences. 

There is throughout a corresponding theory of the enumeration of 
numbered diagrams of the ‘Magic Square’ type which has been much 
developed and will without doubt be the subject of further investiga- 
tions. 


I 



THREE LECTURES ON 
FERMAT’S LAST THEOREM 


BY 

L. J. MORDELL 

MANCHESTER COLLEGE OF TECHNOLOGY 


I 


PREFACE 


I N March 1920, I gave at Birkbeck College, London, a course 
of three public lectures on Fermat’s Last Theorem. The 
lectures were intended primarily for persons with a mathe- 
matical training, but not necessarily for those who had made a 
special study of the Theory of Numbers. A general account 
was given of the various methods that have been devised for 
dealing with the question, more attention being paid to principles 
than to details. 

This booklet consists of the lectures in practically the form 
in which they were delivered. It also includes a few details 
which it was found convenient to omit from the lectures. I hope 
it may be of assistance in giving to the reader some idea, not 
only of the difficulties involved, but also of the progress made 
in dealing with this famous theorem. 

1 have to acknowledge my indebtedness not onlv to the 
authors mentioned herein, but also to the works of Smith, Bacli- 
mann, Hilbert, Kronecker, Sommer, and Dickson, on the Theory 
of Numbers. Full references to the subject are given by Dickson 
in his very useful paper on “Fermat’s Last Theorem” in the 
Annals of Mathematics, Yol. xvm. 1917 ; and in Yol. n. of his 
History of the Theory of Numbers, which has just been pub- 
lished. 


L. J. MORDELL. 


November 1920 . 



CONTENTS 


CHAPTER I 


STATEMENT OF THE THEOREM 

Of all the abstract sciences, perhaps none is so remarkable for the 
ease with which theorems are arrived at inductively, and for the 
difficulty and importance of the developments arising in the efforts 
to prove the theorems so suggested, as the Theory of Numbers. 
An admirable illustration of this fact is furnished by Fermat’s Last 
Theorem, namely, that if n is a positive integer greater than two, 
the equation 

x* + y n = z n (1) 

cannot be satisfied by integer values for the unknowns x, y, and z 
unless one of them is zero. On the contrary, when n = 2, it is well 
known that the equation possesses an infinite number of solutions in 
integers. 

Fermat (1601—1665) was a French mathematician of the first 
rank, who made a special study of the Theory of Numbers, including 
that part of the subject dealing with the solution of indeterminate 
equations, called Diophantine Analysis, after the Greek mathematician 
Diophantus who flourished during* the third century a.d. A new 
edition of the latter’s works was brought out by Bacliet in 1621. Fermat 
possessed a copy of this work and entered in the margin of the pages 
a number of theorems he had discovered, most of which are now 
included a.s special cases in the classical theory of the subject, but 
without any proofs or indications of his methods. Besides the theorem 
now' known as his last theorem, he placed the remark that he had 
discovered a truly wonderful proof but that the margin of the book 
was too small to contain it. Since that time, no general proof has 
been found for all values of n, even though it has been attempted by 
the greatest of mathematicians including Euler, Legendre, Gauss, 
Abel, Dirichlet, Cauchy and Rummer, has been several times made the 
prize question of learned societies such as the Academies at Paris and 
Brussels, and though finally, in 1907, a prize of 100,000 marks was 
established for the first proof. 

* Diophantus of Alexandria, by Sir Thomas L. Heath, "2nd edition, p. 2. 


2 


DID FERMAT PROVE HIS THEOREM ? 


Did Fermat prove his Theorem f 

The question immediately suggests itself. — Is it probable, that 
nearly three centuries ago, Fermat really proved this theorem, which 
still baffles mathematicians who have at their disposal the wonder- 
ful and far reaching developments in mathematics since Fermat’s 
time — especially as it seems likely that Fermat’s methods could only 
be elementary considered from a modern standpoint ? From what is 
known of Fermat’s character, it is fairly certain that at any rate he 
was under the impression that he had a proof meriting his description 
of it. This statement is confirmed by the fact that when enunciating 
a theorem to the effect that 2 2 " + 1 was a prime number for all positive 
integral values of w, he added that while convinced of the truth of this 
theorem, he could not prove it*. Many years afterwards Euler showed 
that the theorem was false, and that 641 was a factor when n = 5. 

It is of course possible that Fermat was mistaken in thinking that 
his proof was valid, for even the greatest of mathematicians have 
made mistakes. The late Prof. H. J. S. Smith, while pointing out 
that Gauss was unfavourably inclined to Fermat, thought however 
that there was no ground for supposing that Fermat was mistaken. 

Analysis of another statement by Fermat 

A little light perhaps may be thrown on Fermat’s statement by 
considering a similar case. He had proposed as a problem to the 
English mathematicians to show that there was only one integer 
solution of the equation 

f = .r- + 2, 

obviously x ±5, y = 3. On this he has a note to the effect that 
there was no difficulty in finding a solution in fractions, but that 
he had discovered an entirely new method, wonderfully beautiful and 
most subtle, which enabled him to solve such questions in integers. 
This statement seems clear and straightforward and would lead one to 
suppose that, given an equation of the form 

tf = a a + l\ 

where k is an integer, Fermat possessed a method which enabled him 
to ascertain if the equation possessed integer solutions, and in that 
case to find them. 


* See however Dickson’s Histnnj of the. Theory of Numbers , Vol. l. p. 375. 


ANALYSIS OF ANOTHER STATEMENT BY FERMAT 


3 


Fermat’s statement, however, cannot be as dear as it seems to be. 
For I showed* several years ago that no equation of the form 

y 3 = x 2 + k, 

where k is a positive or negative integer, could have more than a 
finite number of integer solutions. It seems unlikely that this fact 
was known to Fermat, so we are led to conjecture that his method 
must have been equivalent to some such process as the following. 

In tf = x t + 2, 

put y = 0 ? + 2b", 

where a and b are integers, and take 

x + \l -2 = {a + b \l - 2) 3 . 

Equating real and imaginary parts 

x = a 3 - 6ab”, 
l=b (3a 2 - 2b 2 ). 

Since a and b are integers 

b=± 1, 3a 2 - 2b 2 = ± 1, 

or b = ± 1, a = ± 1, 

giving x = ± 5, y = 3. 

It is of course by no means obvious that this process, which can 
be described without the use of complex quantities, gives all the 
values of x and y. In any case it seems doubtful if Fermat’s descrip- 
tion of his method would be justified at the present time. 

At the same time it is possible that Fermat did possess a valid 
proof of his last theorem, but from the circumstances of the case, it is 
extremely difficult for us to form any conception of his method. One 
can easily recall a number of theorems which have proved extra- 
ordinarily difficult to great mathematicians and which now seem 
elementary enough. Nothing can appear simpler than the solution of 
the cubic equation, but many centuries elapsed between the solution 
of the quadratic and of the cubic. Another instance of a different 
type is supplied by the proof of the transcendental character of 
i.e. the impossibility of solving the problem commonly called the 

* A statement by Fermat, Proceedings oj the London Mathematical Society 
(Read Feb. 1918), (Records, etc.), Ser. 2, Vol. xviii. (1919), pp. v, vi. The same 
result holds for the equation 

ey 2 — ax 3 + bx 2 + cx + d, 

where a, b, c, d, e are any integers for which the right-hand side has no squared 
factors in x. 


4 


A SIMPLIFICATION OF THE PROBLEM 


squaring of the circle, which can now be put in a very simple and 
elementary form. 

Mathematical study and research are very suggestive of moun- 
taineering. Whymper made seven efforts before he climbed the Matter- 
horn in the 1860’s and even then it cost the lives of four of his party. 
Now, however, any tourist can be hauled up for a small cost, and 
perhaps does not appreciate the difficulty of the original ascent. So in 
mathematics, it may be found hard to realise the great initial difficulty 
of making a little step which now seems so natural and obvious, and it 
may not be surprising if such a step has been found and lost again. 

A SIMPLIFICATION OF THE PROBLEM 
Coming back to the equation 

x"+y n = z n (1), 

it is obvious that if an} - two of the unknowns have a common factor k, 
then the third unknown is also divisible by k. Putting 
x, y, z = k£, k-q, k£, 

respectively, in equation (1), k" divides out, leaving 

£ n + V n = 

where now no two of the unknowns have a common factor. There 
will be no loss of generality then if it is supposed that no two of the 
unknowns in the original equation (1) have a common factor. Next it 
is sufficient to prove the impossibility of equation (1) when n is equal 
to 4 or to any odd prime p. For if each of the equations 

x' + i/ = z\ .i* + if = z‘> (2) 

is insoluble, the same holds of equation (1) since n must be divisible by 
either 4 or an odd prime. For example if n is divisible by p, say 
n = pq, equation (1) can be written as 

which is then insoluble because of the special case (2). 

The equation x 2 + y l = a 2 

As regards the case n = 2, it is well known that the general solution of 

x' + if = - (3), 

wherein no two of the unknowns have a common factor, is given by 
x = a 2 - b 2 , y = 2 ab, z = a' 1 + b 2 , 

where y is that one of the unknowns which is even, and a and b 
are prime to each other and not both odd. This result was known to 
the Indian mathematicians. 


THE EQUATION X* + y* = Z* 


5 


The equation x* + y* = z* 

The case n = 4 is remarkable not only from the fact that in contra- 
distinction to all the other values of n, the theorem can be rigorously 
proved by absolutely elementary means, that is by methods which do 
not implicitly make use of new ideas unknown during Fermat’s time, 
but also from the fact that a proof by Fermat for a very closely related 
theorem is extant. A proof was given by Leibnitz in a manuscript 
dated December 1678, and also by Euler. 

The proof of the theorem is so simple that it will be worth while 
giving it completely. 

It is obviously sufficient to consider the equation 
x“ + y“ -■ z 2 , 

where x, y and s are all prime to each other. Further it may be 
assumed that all the quantities referred to are positive. As all 
numbers are either odd or even, x is of the form 2 m or 2 m + 1, where 
m is an integer. Hence x 2 is of the form 4m 2 or 4m 2 + 4m + 1, that is 
of the form 4 M or 4 M+ 1, so that a number of the form 4d/+2 or 
4J/+ 3 cannot be a square. Hence x and y cannot both be odd, for 
then the sum of their fourth powers would be of the form 4J/+2, and 
this cannot be a square. Hence either x or y must be even, and as it 
is obviously immaterial which one is, suppose it is y. Since 

(x 2 f + (rff = z 2 , 

it follows from equation (3) that we must have 

x 2 -- a 2 - b 2 , y l = 2ab , z = a' 2 + b 2 , 
where a and b are prime to each other, and not both odd. From 

x 2 = or - b 2 

we see that (t cannot be even, for then b would be odd and x 2 would be 
of the form 4 M + 3, which is impossible. We have then 

oc 2 + b' 2 = a 2 

where b is even, a is odd and prime to b, so that no two of a, b, x have 
a common factor. Hence it follows from equation (3) that 
x = p 2 - <f, b = 2pq, a =p 2 + q 2 , 
where p and q are prime to each other and not both odd. From 
y 2 = ‘lab, 

f = <y + ? 2 ). 


we have 


6 


THE EQUATION X 3 + y 3 = Z 3 


Since p and q are prime to each other, each of them is prime to 
p ' 1 + q 3 , and hence all three must be perfect squares. Put then 
p-r 2 , q = s 2 , p 3 + q 3 = t 3 , 
from which r 4 + s* = t 3 . 

Now the values of x, y, z in terms of r, s, t are given by 

x = r A - s 4 , y = 2rst, z = a 3 + b 3 = r 8 + 6rV + s 8 , 
so that z > (V 4 + s 4 ) 2 > t 4 or t <■ si z. 

It follows then that if one solution of the equation 

x* + if = z 3 

is known for which none of the unknowns is zero, another solution 
( r , s, t ) can be found for which none of the unknowns is zero and 
such that t < Hz. This process can be continued, so that an infinite 
number of positive integers t, t u t 2 ■■■ can be found such that 

t , < s!t, t 2 < s/t, . . . , 

which is clearly absurd. 

This proves the impossibility in the case of n = 4, the method of 
proof being known as the method of infinite descent. 


The equation + y 3 - 7 ? 

The case n = 3, that is the equation 

r' + y 3 = z 3 (4), 

had been known to the Arabian mathematicians nearly seven hundred 
years before the time of Fermat, and a faulty proof of the impossibility 
had been given by them. It is very probable that Fermat discovered 
this special case before he discovered the general theorem, for he had 
proposed as a problem “to find values of x, y, and z satisfying the 
equation,” and had later declared it was impossible. Euler was the 
first to prove the theorem for this special case, but his proof was 
incomplete in respect of an assumption wherein lay the real difficulty 
of the question, and which contained the germ of the development of 
the theory of ideals which was to be applied so successfully by Kuinmer 
many years later. Euler’s proof as given in his Algebra is substantially 
as follows. 

Two of the unknowns x, y, z must be odd, and as any of the 
unknowns may be either positive or negative, there is no loss of 
generality in supposing that z is even, and that x and y are both odd. 


THE EQUATION X? + y 3 = Z 3 7 

Write then x+y = -2p, x-y = 2q 

so that x = p + q, y=p - q, 

and the original equation becomes 

2j o ( p 2 + 3(f) = z 3 . 

Now p and q are prime to each other, and cannot both be odd, for 
then x and y would not be prime to each other. Further p cannot be 
odd and q even, for then z 3 would be divisible by 2 and not by 8, 
which is impossible. Hence p must be even and q odd, so that p" + 3 f 
is odd. Hence as p and q are prime to each other, 

2 p and p- + 3 q 2 

are either prime to each other, or have a common factor 3. In the first 
case p and hence z are both prime to 3, while in the latter case they 
are both divisible by 3. 

Let us consider the first case in detail. As 2 p and p 2 + 3 f are 
prime to each other, each must be a perfect cube, so that we can 
write 

p 2 +3f = r 3 (5). 

Values of p, q, r can be found by taking 

r=m- + 3 m 2 , 

where m and n are integers, and writing 

p + q J - 3 = (m + n sf - 3)“. 

By equating real and imaginary parts 

p = m 3 — 9»«m 2 , q = 3 mhi - 3m 3 , 

and if m and n are prime to each other and not both odd, and m is 
not divisible by 3, then p and 7 are prime to each other and p is not 
divisible by 3. But though this method gives suitable values of p, q, r 
satisfying 

p 1 + 3 (f = r 1 , 

it is by no means obvious that all the values of p, q, r can be found in 
this way, though as a matter of fact it is so in this particular case. If 
the equation had been 

p 1 + I 1 if t r 3 , 

all the values of p and q would not be given by putting 
p + q J - 1 1 = (m + n J - 1 1) 3 . 

The removal of the difficulty involves the study of the arithmetical 
theory of the binary quadratic form, or of ideal numbers. 


8 


THE EQUATIONS r' : + y s — Z s AND X 1 + y 7 — Z 7 


Now since 2p is a cube, the values of m and n are such that 
2m (m + 3m) ( m - An) 

is a perfect cube. 

But since q = An ( m + ») (■ m - n) 

is odd, n is odd and m is even. Hence since m is prime to 3, no two of 
2m, m + An, m - An can have a common factor ; and since their product 
is a perfect cube, each of them must be a cube. Put then 
m + An = tt 3 , m - An = 6 3 , 2m = r 3 , 
so that by addition a 3 + b 3 = c 3 . 

Hence z 3 = 2 p (p 2 + 3 <f) = a 3 bV (nr + 3re 2 ) 3 , 

or z — abc ( mi 1 + 3»' 2 ) = \abc ( a 11 + a 3 b 3 + b 6 ), 

so that as a and h cannot both he unity, z is numerically greater 

than c. It follows then, just as in the case when n = 4, that we should 

have an infinite sequence of numerically decreasing integers, which is 

impossible. 

The same result follows in the second case when z is divisible by 3, 
but we need not go into details. 


The equations x h + if - z 5 and x 7 + if = £ 


The next cases to be proved were when n = 5 or 7. The first case 
was dealt with by Legendre and Dirichlet in T825, while the case 
of n= 7 was proved in 1840 by Lamtf and Lebesgue. The proofs 
involved ideas not greatly dissimilar from the case when n = 3 and 
depended upon two facts. Firstly, that if// is a prime and x and y are 
prime to each other, the two expressions 

, x> + f 

x + y and 

x + y 

are either prime to each other, or have as a common factor the first 
power only of p. The proof is immediate, for putting 


the two expressions become 


x + y = n, 


s and 


x p + 0 - x) p 


or 


s and s p ~' - p8 v - x+P-'^L 1 x p ' 3 x‘ ... + - sx p 2 


Also s is prime to x, whence the result, which is due to Jaquemet 
(1651—1729), follows. 


THE EQUATIONS .r 5 + if = 2 5 AND X 1 + y 7 = Z 1 


9 


The second fact is that 


x? + y r 
x + y 


can be written in the form 


where U and V are polynomials in x and y, but the proof is more 
complicated than that for the first fact so it may be omitted here. In 
the particular case, however, when n = 5 or 7, there is no difficulty in 
finding U and V by elementary algebra. 


Taking- now the case n = 5, we have 


or 



where U and V are quadratic functions of x and y. From the above, 
the two factors on the right-hand side are either prime to each other, 
or have a common factor 5 of which the first power only will divide 
U- - 5 V 2 

. We then have an equation of the form 

4r 

1 ( f 2 - 5 V 2 ) = IP or 5 W\ 

The difficulty arising in the case n = :-i, and overlooked by Euler, occurs 
in the discussion of these equations, but it is possible to avoid it by 
similar methods. It follows also that the case n - 5 is impossible. 
A similar method applies to the case n ~ 7, but more algebra is required 
than for n - 5. 


CHAPTER II 


RUMMER’S WORK 

The difficulties arising with increasing values of n soon made it 
clear that other methods were required for the general case. These, of 
which we shall now give an account, were introduced by Rummer 
(1810 1893). His results were the most important contribution to 

this subject by any mathematician either before or after his time. 
Not only were they the most general, in that he succeeded in proving 
lermats Last Theorem for a large number of values of n included in 
several classes, but they were also the most useful, and marked an 
important stage in the development of mathematics. The theory of 
ideals, which is now part of the fundamental groundwork of the Theory 
of Numbers, had its origin in Rummer’s researches on this subject and 
the general law of reciprocity. His methods and results were the 
starting point of numerous investigations commenced many years 
after his time, and have led to some very surprising results even 
within the last twelve years. His work is an excellent illustration 
of the great indebtedness of mathematics and mathematicians to the 
consideration of one or two isolated questions. 

Writing the equation ( p an odd primet 

. x v + y p = z v 

in the form 

( x + V) i. x + ty) + £ 2 y) •■■(* + &- 1 y) ■ z‘‘ (6), 

where £ is a complex pth root of unity, the attention of mathe- 
maticians was drawn to the study of expressions of the form 
a + b£ + c?+ 

where a, b, c ... are integers, and to inquire if the ordinary laws of 
arithmetic applied to such expressions. 

Many of the most important developments of arithmetic depend 
upon the definition of a prime number and the so called factor 
theorem, namely that every number can be resolved into prime factors 
in one way only. It follows from this fact that if positive integers 
A, B, C ... K, L, of which no two have a common factor, satisfy the 
condition 


ABC ... K = If, 


ARITHMETICAL PROPERTIES OF NUMBERS OF THE FORM a + ib 11 

then each of the Dumbers A, B ... K must be a perfect joth power. 
Should any of the quantities A, B ... have a common factor, this 
result must be slightly modified ; for example A now will be a 
perfect />th power multiplied by a constant depending on the common 
factors mentioned above. Particular cases of this theorem have already 
been used. The question immediately suggests itself — Can this theorem 
be extended to apply to equation (6), and can we deduce that the 
factors x + Cy, x + £*y . . . are each jpth powers of expressions of the form 

a + bt, + c£ 2 + . . . 

or perhaps multiples of such pth powers ? If so, a proof of Fermat’s 
Last Theorem would be fairly easy. 

Arithmetical properties of numbers of the form a + ib 

Before we answer this question, let us consider what occurs in some 
analogous but simpler cases. The simplest case would be the study of 
complex numbers of the form 

a + ib, 

where i V - 1 and a and b are rational numbers. When a and b are 
integers it seems natural to call the complex number a + ib a complex 
integer. When b 0 the complex integer becomes an ordinary integer. 
Further it is obvious that the sum, difference or product of two 
complex integers is also a complex integer, so that the definition of a 
complex integer is consistent. 

As regards division, a complex integer a + ib is said to be divisible 
by a complex integer c + id if a complex integer x + iy can be found 
so that 

a + ib = (c + id) (x + iy). 

We note that while 1 is exactly divisible by only two integers, namely 
± 1, it is exactly divisible by four integers in the complex theory, 
namely + 1, ±i. The divisors of unity are called units. 

The question now arises, “ What is the definition of a prime number 
in the new theory?” The odd primes 

3, 5, 7, 11, 13, 17, ... 
can be divided into two groups such as 

5, 13, 17, 29, 37, ... 

of which every one leaves the remainder 1 when divided by 4 ; and 
3, 7, 11, 19, ... 

every one of which leaves the remainder 3 when divided by 4. 


12 THE FACTOR THEOREM FOR PRIMES IN THE NEW THEORY 

The numbers in the first group, however, are no longer primes 
in the complex theory. For it is clear that 

5 = (2 + 0 ( 2 - 0 , 

1 3 = (3 + 2 0 (3 - 2 0, 

17 = (4 + «) (4 - i), 

and it can be shown that every prime number of the form An + 1 can 
be expressed in this way, that is 

An + 1 = (a + ib ) ( a — lb) = d l + b' 1 , 

where a and b are integers. This fact was indeed stated by Fermat 
and first proved by Euler, but it is of an entirely different kind from 
the theorems of elementary arithmetic. 

The numbers in the second group cannot be factorised in this way, 
for then 

An + 3 = {a + ib) ( a - ib) = a 2 + b' ! . 

But as already remarked, the square of any integer when divided by A 
leaves a remainder 0 or 1. Hence a" + b 2 when divided by 4 can only 
leave a remainder 0, 1, 2 and not 3. This proves the statement. 

The behaviour of the even prime 2 is very different from that 
of the odd primes. For 

2 = i ( 1 - if, 

so that 2 Is practically* a square number in the complex theory. 

We can now define the prime numbers of the complex theory. 
These are the numbers 

3, 7, 11, 19 

that is the primes of the form An + 3 in the ordinary theory ; the 
complex quantities 

2 ± i, 3 + 2 i ... a± ib, 

which are the factors of 5, 13, ... and of the primes of the form An + 1 ; 
and lastly 1 - i. 

The FACTOR THEOREM for primes in the new theory 

It can be shown that in the complex theory, the primes, as just 
defined, have properties practically identical with those of the ordinary 
primes. For example every complex integer can be resolved into prime 
factors in one way and only one way, noting of course that the factors 
a + ib, - (a + ib), ± (a + ib) 
are not considered as different. 


* That is, except for the unit factor i. 


THE EQUATION X 2 + y ' 2 = z' 


13 


Suppose for instance that a + ib is a factor of the number p which 
is a prime in the ordinary theory, so that 

a 2 + b' 2 p. 

Then x + iy is divisible by a + ib if 

x + iy _ (x + iy) (a - ib ) 
a + ib a 2 + b 2 

is a complex integer. Hence 

£ = ax + by and y = ay - bx 
must both be divisible by p. But as 

a$ — br)= (a 2 + b~ ) x =px, 

and a and b are both prime to p, it is clear that £ and y are both 
divisible by p if one of them is. Hence the condition that* 

(x, + iy,) ( x-i + iy 2 ) = 0 (mod ( a + ib)), 
is that b (x,x 2 - y,y 2 ) - a {x,y 2 + x 2 y,) = 0 (mod p), 

or multiplying by b 

b 2 (x,x 2 - y,y 2 ) - ab (x,y 2 + x, 2 y,) = 0 (mod p), 
which since a 2 + b' 2 = 0 (mod p), 

can be written as 

(ay, - bx 1 ) ( ay 2 - bx 2 ) = 0 (modjo). 

Hence one of these two factors must be divisible by p, that is, the 
corresponding factor x, + iy, or x« + iy, is divisible by p. This shows 
that practically the same arithmetical laws hold for complex integers 
as for ordinary ones. 

The equ ation x 2 + y 2 = 5" 

We can now solve the equation 

x 2 + y 2 = z n (6 a), 

where x, y and z are ordinary integers no two of which have a common 
factor, by writing 

(x + iy) (x - iy) = z n . 

Now in the complex theory x and y are still prime to each other, so 
that the common factor of the two complex quantities x + iy, x— iy 
must be a divisor of 2, that is the common factor must be 1, 1 + i or 2. 
We can exclude 2 because then x and y would both be even. 

We can also exclude 1 + i, for if 

x + iy x + 7 / + i(y- x) 

I + i ~ " 2 

* The statement A=B (mod C) means that A- B is divisible by C. 


14 NUMBERS OF OTHER KINDS, AND THEIR FACTORISATION 

were a complex integer, x + y and x - y would both be divisible by 2, 
or since x and y are both prime to each other, this means that x and y 
are both odd. But this is impossible for then x 1 + y- would be double 
an odd number and could not be a perfect rath power if ra> 1. Hence 
as x + iy and x — iy are prime to each other, each of them, except for a 
unit factor, must be a perfect rath power, so that the solution of the 
equation (6 a) is given by 

x + iy = i r (a + ib) n , x — iy = i~ r ( a - ib) n , 
z = a 1 + b\ 

where a and b are ordinary integers such that x, y and z have no 
common factor, and r is any integer. 

Numbers of other kinds, and their factorisation 

The difficulties arising then in the discussion of complex integers 
of the type a + ib are comparatively simple. These complex integers 
naturally suggest algebraic integers of a more general type, such as for 
example, a + bJm, where a, b and m are rational quantities. We shall 
consider in particular the study of the quantities x = a + b - 5, where 
a and b are rational, and shall call x an algebraic integer if a and b are 
integers. A more general definition is that x is an algebraic integer if 
it is a root of an algebraic equation of which the coefficients are integers, 
while the coefficient of the highest power of x is unity. The two 
definitions are equivalent in our special case, though they would not be 
so if 5 were replaced by Jb. The addition, subtraction, division 
or multiplication of such integers calls for no comment. It is wheu 
we start to factorise such algebraic integers that a difficulty soon 
arises. It seems natural to call an algebraic integer x + y J - 5 a prime 
when it is not divisible by a + b J - 5 unless a = ± x, b = ±y. Let us 
accept this definition of a prime. 

Take the number 21 for example. Clearly we have 

21 = 3 x 7 =(4 + n/-5) (4- J - 5) = (1 + 2 J - 5) (1 — 2 J - 5). 

But 4 + -J — 5 cannot be resolved into a product of factors of algebraic 
integers of the form a 4 b J - 5. For if this were possible, then 
4 + J -;> = (« + b J - 5) (c + d J - 5), 

say, so that 4 — J — 5 = (a — b J — 5) (c — d J - 5). 

Hence by multiplication 

21 = ( a 2 + 5 6 2 ) (c 2 + 5 d 2 ), 


THE DIFFICULTY ARISING IN THE GENERAL EQUATION 


15 


so that a 2 + 5b' J , being a factor of 21, must be either 1, 3, 7 or 21. 
This gives a = ±1,5 = 0; orat = + 4, 6 = ±1; or a = + 1, b -■ - ± 2. The 
solution a - ± 1, 6 = 0 does not give a factor of 4 + V-5, while the 
solution « = + 4, 6 = + 1 does not give a factor, since 

4 + 11 + 8 J~ > 

4 - \T^E~ 21 

is not a complex integer. It is also easily seen that +1 + 2 J - 5 
is not a factor. 

In the same way it is found that neither 1+2 V — 5, 3 nor 7 splits 
into factors of the form a + b J - 5. We have then 21 expressed as a 
product of primes in three different ways. Moreover 

(4 + \A— 5) (4 - J - 5) is divisible by 3, 

where 3 is a prime in the new theory, while 4 + 5 and 4 - J~b 

are both prime to 3. Hence the factor theorem of arithmetic which 
states that the product of two integers ab cannot be divisible by 
a prime p, unless either a or b is divisible by p, no longer applies. 
This breaking down of one of the fundamental laws of arithmetic for 
integers of the type a + b J - 5 brings us face to face with a great 
difficulty, and suggests that the method of defining a prime in the 
present instance, in the same manner as for integers of the form a + ib, 
is not satisfactory. It is however not obvious at first sight how to 
suggest an alternative method. Unless this is done, we cannot deduce 
from the equation xy - i 2 , where x and y are prime to each other, that 
x and y are both perfect squares. 

For example (2 + J - 5) (2 - = 9, 

where the factors 2 ± J - 5 are primes, have no common factor, and 
are not equal to the squares of integers of the form a + b J - 5. 

The DIFFICULTY ARISING IN THE GENERAL EQUATION 

It is now clear that given an equation of the form 

O + u) 0 + ly) (x + C-y) ■ ■ ■ 0 + Q'-'y) -~ v (6), 

where £ is a complex joth root of unity, it cannot be asserted that 
x + for example, is the joth power of an expression of the form. 

a-bt+cC+.-.H 1 '- 1 , 

until an investigation has been made of the arithmetical properties of 
such algebraic integers. It may happen that the definition of a prime 
in the new theory in the same way as for numbers of the form a + ib 


16 THE DIFFICULTY ARISING IN THE GENERAL EQUATION 


will be satisfactory, in which case the algebraic integers can be factor- 
ised in one way only ; or the same difficulty may arise as in the case of 
numbers of the form a+b J - 5. As a matter of fact, it is not true 
that the algebraic numbers above can be factorised uniquely, but the 
first case of failure occurs when p = 23. It is not surprising then that 
such mathematicians as Lamd, Cauchy and even Kummer should 
have been originally under the impression that the algebraic integers 
above could be factorised uniquely. 

Lauffi made this false assumption in giving a proof of Fermat’s 
Last Theorem, as was pointed out by Liouville and Kummer. Kummer 
also had previously made the same mistake in attempting a proof, as 
was pointed out by Diric.hlet, who expressed his belief that the algebraic 
numbers involved could not in general be factorised uniquely. 

The question before us, then, is the removal of the difficulty men- 
tioned above, and a very simple illustration due to Hilbert may show 
us how this can be done. Let us consider only the odd integers of the 
form \n + 1, that is to say 1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 
49, 53, 57, 61, 65, 69 ..., and investigate what happens when we 
attempt to build up the arithmetical laws for this group of integers, 
say a, b, c .... A number a will now be called a prime if it cannot be 
expressed in the form a=bc unless b or c equals unity. Thus 21 
would be a prime number in the new theory, for although 

21 = 3 x 7, 

neither 3 nor 7 is included in the group of integers of the form 4 n+ 1. 
Also 49 x 9 = 21 2 , and neither 9 nor 49 is the square of a number of 
the group, while 9 and 49 are prime to each other, since no number* 
of the group divides both of them. Again 693 can be factorised as 
693 = 9 x 77 = 21 x 33, 

that is in two essentially different ways, since 9, 21, 33 and 77 must be 
considered as primes in the new theory. 

The difficulties arising now are of exactly the same kind as arose in 
the consideration of algebraic numbers of the form a + bj— 5. But 
the way out of the difficulty is obvious in the present case. Instead of 
considering only the integers 1, 5, 9, 13 .... we consider in addition 
the odd numbers of the form 4w + 3, for example 3, 7, 11 .... Then 
we know that in the new group of integers, 1, 3, 5, 7, 9 ..., the 
ordinary laws of arithmetic hold, and now 

9 = 3 2 , 77 = 7 x 11, 21 =3 x 7, 33 = 3 x 1 1, 

* Except unitv. 


INTRODUCTION OF IDEAL NUMBERS 


17 


so that the two different methods of factorising 693 reduce to the one way 
693 = 3’ x 7 x 11. 

This method of removing the difficulty is very simple and general, 
and suggests at once the question— Can this idea be extended to 
the algebraic numbers of the form a + b J - 5 ; or in other words can 
the group of algebraic numbers of the form a + b J - b be enlarged by 
joining a new group of numbers, so that the factor law of arithmetic 
holds for this enlarged group ? The answer is in the affirmative, not 
only for these special algebraic numbers, but for all algebraic numbers. 
The method of doing this has been presented in three different ways* by 
Kummert, Dedekind, Kronecker and \Y eber. The principles underlying 
them are essentially the same, aud are now included under what is 
known as the arithmetical theory of algebraic numbers. 

Introduction of ideal numbers 

The methods may be made clearer if presented in a rather different 
way from those of the investigators above, but which, though very 
useful for giving a clear insight into the matter, would prove rather 
difficult if made the starting point of an investigation for the general 
algebraic number. 

For our purpose it may be sufficient to say that instead of con- 
sidering algebraic numbers of the form a + b J — 5 we consider the new 
group of numbers defined by r, where 

t 2 = x + y v - , r >, 

and x and y are any integers, whose greatest common factor is a perfect 
square or 5 times a perfect square, and satisfying the condition that 

A' 2 + by 2 

should be a perfect square. The group of algebraic integers now arising 
reproduces itself by multiplication and its members niay he called 
ideal numbers. It includes as part of itself the numbers of the form 
a + b J~^b, of which we have already spoken. 

The ordinary primes 

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47 ... 
can now be divided into three classes, while 5 is in a class by itself. 

’ See Bachmann, Znhlentheorie, Vol. v. p. 521, for two other methods by 
Hensel and Sochotzki. 

t Rummer dealt only with the algebraic numbers arising from the complex rootR 
of unity. 


18 THE PROOF OF A UNIQUE FACTORISATION I, AW 

The first class consists of primes such as 29, 41, 61 ..., which can be 
written in the form 

29 - (3 + 2 J~^b) (3-2 «/^5), 

41 = (6 + 5) (6 - J^b), 

61 = (4 - 3 5) (4 - 3 s/~), 

that is, each of them can be expressed in the form a 2 + bb 2 . 

The second class consists of primes such as 3, 7, 23 ..., which 
cannot be expressed in the form a 2 + bb 2 , though their squares can be 
expressed in this form, for example 

3 2 = (2 + J~) (2 - J~^b), 

T 2 = (2 + 3 J~^b) (2 - 3 J~), 
and in general for any prime p of this kind 

p 2 = c 2 + bd 2 , 

or if ideal numbers t, , r 2 are defined by 

t, 2 = c + d J - 5, T.? = c-dJ-b , 
then p can be factorised in the form 

P=Tl T S . 

The third class of primes consists of primes, say q, such as 
2, 11, 13, 17 ..., 

which are such that neither they nor their squares can be expressed* * in 
the form a 2 + bb 2 . A very simple rule enables us to distinguish between 
the three classes of primes, but the principles employed depend upon 
the Theory of N umbers. The prime 5 as already remarked is in a class 
by itself and since 5 = — (v - 5) 2 , 5 is practically a square number. 


The proof of a unique factorisation law 
It can be shown that the new group of algebraic integers of the 
type specified by r - J x + y J - 5, and which includes the complex 
numbers of the form m + n J - b as a special case, can be factorised 
uniquely by means of the quantities just denoted by a + b J — 5, 

vc + d */ — 5, q and \l - 5. 

This follows from the condition of divisibility of an ideal number 
by the new primes, just as in the case of complex numbers of the form 
x + iy. For example, if the ideal number J ,r + y -J — b is divisible by 
V 2 + J - b say t, so that r is an ideal factor of 3, we must have 
- 5 = J2 + J — 5 J m + n J — 5, 




v+y 


* We assume that 6 + 0. 


THE PROOF OF A UNIQUE FACTORISATION LAW 


19 


where m and n are integers. It is easily seen that 
9m = 2x+by, $n = -x + 2y, 
and that both m and n are integers if 

x - 2y = 0 (mod 9). 

Similarly the number x + y J - 5 is divisible Ijy t if 

J x' 1 - 5 y 1 + 2xy <J~— 5 is divisible by r. 

The condition for this is that 

x 2 - by 1 — 4 xy = 0 (mod 9), 
that is ( x - 2 yf e 0 (mod 9), 

or x - 2y e 0 (mod 3). 

We can now prove that if the product of two algebraic numbers is 
divisible by t, one of them is divisible by r. Let the numbers be 

*i + 3/1 •J - 5, x 2 + y 2 —b. 

Their product 

x, x -2 - 5 y t y a + 5 (*,y* + x 2 y^ 

being divisible by t, we must have 

Xi x -2 - by,y, 2 - 2 (x 1 y 2 + x 2 y l ) = 0 (mod 3), 
that is {x-i - 2 y,) (ar a - 2y 2 ) e 0 (mod 3). 

This means that one of these two factors must be divisible by 3, so 
that, if say, 

■i\ — 2 y, e 0 (mod 3), 

then from the above X\ + y x J — 5 must be divisible by t. 

The same result follows if we consider the product of two ideal 

numbers Jx^+ y, J — 5> Jx 2 + y 2 J — b. This product is divisible by 
r if 

x 1 x i -by i y 1 -2 {x 2 y 2 + x^) = 0 (mod 9), 
or (x, - 2 y,) (a:, - 2y 2 ) e 0 (mod 9), 

and a simple discussion shows that if x x — 2 y x is divisible by 3, it is 
also divisible by 9 (noting that x* + 5y, a is a square). Hence as before 
one of the ideal numbers must be divisible by r. 

It is clear now that being given an algebraic number x + y J - 5, 
the condition 

x- 2y = 0 (mod 3) 

is sufficient to define the ideal prime factor J‘2 + >J - b of the complex 
number, and that the actual form of the ideal number need not be 


20 APPLICATION OF IDEAL NUMBERS TO FERMAT’S LAST THEOREM 

given explicitly. Dedekind, for example, put #-2y=3 m, so that 
x + y si — 5 becomes 3 m + y (2 + si — 5), and then considered the pro- 
perties of the groups of numbers arising by taking different values for 
m and y, and called the group of numbers an ideal. Kronecker how- 
ever would have studied the linear expression as a function of x, y; 
while Kuminer would have used the congruence 
x - 2y = 0 (mods) 
as defining the ideal prime t. 

Application of ideal numbers to Fermat’s Last Theorem 
For algebraic numbers of the form 

ct + b^ + c'C + . . . 

Kummer showed that the ideal numbers were of fhe form 

s!d\ + bii + Ci£- ..., 

where a,, b u c u ... are integers satisfying certain conditions, while r 
is a factor of a number called the number of ideal classes. Its value 
depends only on p, and can be found by a very complicated method 
depending on principles introduced into analysis by Dirichlet. 
Continuing now the discussion of the equation 

(x + y)(x + ty) ... (x + ^ l y) = ^ ( 6 ), 

consider first the case when ; is prime to p. This is tantamount to 
saying that no two of the factors on the left-hand side have a common 
factor. Hence by introducing ideal numbers, we have, practically as in 
the case of the equation a? + y 2 = z m , a number of equations of the form 

x + ly = t-r v , 
x + £ ! y = &T/, 


where t, t 2 , ... are ideal numbers and f, &, ... are units, i.e. quantities 
of the form 

+ Ki + c 2 C' ■■■, 

which are divisors of unity. 

Noting now the explicit expression for t, namely 
r = Z]a + bt, + cf* . . . , 
we have x + Cy = $ (sj a + hi + c^ 2 

If now r is prime to p, and this is an extremely important condition, it 
follows that 


a + bt, + c£ 2 . . . 


ANOTHER RESULT BY RUMMER 


21 


is the uth power of a similar expression, so that we can write 
x + ty = £ (a + /3£ + yC 2 ■ ■ -Y, 

where a, /?, y, ... are integers. We find other equations by changing 
£ into £ 3 , .... It is then a comparatively simple matter to show 
that equations of this kind are impossible, not only when x and y are 
ordinary integers hut also when they are integers of the form 
A + 5£ + C?+ .... 


A similar conclusion can be drawn in the case when s is not prime 
to p. Hence Fermat’s Last Theorem is proved in all the cases where r 
or the number of classes of ideals is prime to p. The condition for 
this can be stated in a remarkable form by noting the following expan- 
sion in ascending powers of x, namely 


x. 

1 




5 (- 1 )- 

71 = 1 


, B n x* n 
(2m) ! ’ 


so that B 2 = ^ c, B 3 = ^, ... are the well known Bernoulli’s 

numbers. Then the required condition is that the numerators of none 
of the first \ (p - 3) of the Bernoulli’s numbers should be divisible 
by p. The only primes less than 100 for which this condition is not 
satisfied are p = 37, 59, 67, and hence it is proved* that 

x p + y v = z v 


is impossible ifjo is an odd prime less than 100, except when p = 37, 59, 67. 

In order to establish the truth of the theorem for these exceptional 
values of p, Kummer gave in 1857 some additional results for primes 
satisfying certain conditions f. These conditions were satisfied hy p = 37, 
59, 67, so that Fermat’s Last Theorem is proved for all values of p, 
prime or otherwise, less than 100, omitting of course p = 2. 


Another result by Kummer 

Some other important consequences were deduced by Kummer in 
the special case when one of the unknowns is not divisible by p. We 

* A complete account of Rummer’s theory of ideal numbers is given in 
Liouville's Journal , t. xvi. 1851. Hilbert in his well known report on “Die 
Theorie der algebraischen Zahlkorper ” gives the modern version. The French 
translation of this report has an appendix giving other results on Fermat’s Last 
Theorem. For a good introduction, see Sommer, Vorlesungen iiber Zahlentheorie. 

t It appears that Kummer has made some errors which vitiate his proof for 
the cases p = 37, 59, 67. See Vandiver “On Rummer’s memoir of 1857 concerning 
Fermat’s Last Theorem, ’ Proceedings of the National Academy of Science , 
Washington , U.S.A., Vol. vi. May, 1920. The case p = 37, however, was proved 
impossible by Mirimanoff in 1892, so that the cases p = 59, 67 are still doubtful. 


22 


DEDUCTIONS FROM RUMMER’S LAST RESULT 


saw previously that we could deduce from equation (6) a series of 
equations of the form 

x + Cy = 6-r/, 

where r=l, 2, ... p. Kummer showed that it was possible to select 
^ 2 — of these equations in such a way that the product of the corre- 
sponding r r ’s is an actual number of the form 
T=a + b£ + cl?+ .... 

By multiplying together the product of the — — equations, he 
obtained a result of the form 


n {x+tr y ) = ±t k T*, 

P — 1 

where the multiplication on the left refers to the selected ----- factors. 
Replacing £ by e°, we have an identity of the form 

n 0 + e rv y) = ± e*" (a + be v + . . .)* + (1 + e v + e 2v + . . . e^ v )f(v), 
where f(v) is a polynomial in e v with integral coefficients. By dif- 
ferentiation he deduced* that 
d v ~' m 

Bn dvP -m [Log (x + e v y)] v . 0 s 0 (mod p). 


when n = 1, 2, ... g (p - 3), where B n is the wth Bernoullian number as 
defined before, a result which has since proved very useful. 

Kummer, although not a candidate for a prize offered by the 
French Academy for a proof of Fermat’s Last Theorem, was awarded 
it in recognition of his researches on complex numbers. Certainly 
never was an investigator on these subjects more worthy of one. 


Deductions from Rummer’s last result 

For about fifty years after Rummer’s work, very little was accom- 
plished either in extending or in developing the full consequences of 
his results on Fermat’s Last Theorem. In the early part of this century, 
however, mathematicians turned once more to Rummer’s results, and 
in particular to the one, that if 

a? + y» = 

had solutions for which z is prime to p, then 
d p ~ ,n 

Bn [Log (x + = 0 (mod p) 

for n = 1, 2, 3, ... £ (p - 3). 

* Abhand. Ak. Wien. Berlin, 1857. 


DEDUCTIONS FROM RUMMER’S LAST RESULT 


23 


This result, by putting 

p-2n = i, 

can be written in the slightly different form 


d' 

foi [ Lo g 0 + eV y)]v - » = 0 (mod p) 


( 7 ), 


where 


i = 3, 5, ... p-i,p-2. 


Take now i = 3, then 


5 J(r- 3 ) [Log (x + S*y)]„_ 0 = 0 (mod p), 


and this reduces to 

*y(*-y)=0 (mod/?). 

Hence if b h p -z) is not divisible by p, 

xy {x - y) = 0 ( mod /?). 

If then the equation has a solution for which x, y, z are all prime to p, 
x = y (mod p) ; 
and in exactly the same way 

x = z (mod p). 

This gives 3x p = 0 (mod p), 

which is impossible if p is not equal to 3. 

Hence we have proved the result that the equation 


x p + y p = z p 


cannot have integer solutions for which x, y and z are all prime to p 
unless Bn r - 3 ) is divisible by p. In the same way, by taking i = 5, 7, 9, 
it is found that in addition 


J [p 7 ) l 


must be divisible by p. This was proved by Mirimanoff in 1905, for 
the last two of the four Bernoulli’s numbers above, and by Kummer 
for the first two, but the case for B^p-^ had been practically announced 
previously by Cauchy, although without proof. 

Mirimanoff also showed by developing the value of 



that, if x, y, z are all prime to p, then Rummer’s result (7) could be 
expressed in the form * 


//*(„-,)(<- 2 i -'< 2 + 3‘-’ f ... ±(p-iy-'t p -') = 0 (mod /?), 


Crelle’s Journal, Vol. cxxvm. 


24 the congruences 2 p_1 = 1 (mod/) 2 ), 3P -1 = 1 (mod/) 2 ) ... 

or say ( p _.) <£< (t) = 0 (mod p ), 

where i = 3, 5, ... p- 2, 

and t is the ratio of any pair of x, y, z. Or again in the form 
<K (f) 4> p . n (t) = 0 (mod p) 
where n = 1 , 2, ... p— 1 . 

The congruences 2 p ~ 1 = 1 (mod /r), S'" -1 s 1 (mod p 2 ) .... 

A number of conditions can be found by eliminating t from the 
above congruences. Although there seems no a priori reason for ex- 
pecting simple results, Wieferich* showed in 1909 that one of these 
conditions could be expressed in the surprisingly simple form 
2 p "'e 1 (mod/) 2 ). 

This extremely simple and unexpected result represented the first 
real advance made in the subject since Kummer’s work. The Got- 
tingen Academy of Science awarded him 100 marks from the interest 
of the Wolfskell fund. 

In 1913, Meissner showed that />=1093 was the only prime less 
than 2000 for which this congruence was satisfied. 

In other words the equation (p a prime) 

,x p + y p = z p , 

where 2000 > p> 2, 

cannot be satisfied by values of .r, y and z, each of which is prime to p, 
except perhaps when p = 1093. 

Simpler proofs of Wieferich’s result were soon given by Frobenius 
and Mirimanoff, the latter also showing t that under the same condi- 
tions 

3 P ~' = 1 (mod j o' 2 ). 

The two congruences above could not of course have been foreseen 
from Kummer’s original results, but another proof was given by Furt- 
wangler, which seems more natural and simple, depending upon ideas 
which should be capable of further extension. The following may 
perhaps give some indication of the ideas involved. 

Suppose we have two odd primes, say 3 and 7, and it is required 
to investigate if integers x and y can be found so that 
x 2 = 7 (mod, 3), y 1 = 3 (mod 7). 


* Crelle’s Journal , Vol. cxxxvi. 
+ Crclle's Journal , Vol. cxxxix. 


the congruences 2^“' = 1 (mod jo 2 ), 3 p _ 1 = 1 (modp 2 ) ... 25 


It is easy to see that the first congruence is satisfied by x = 2, while 
the second congruence is impossible. If however 3 and 7 had been 
replaced by any odd two primes p and q, a very simple theorem known 
as the law of quadratic reciprocity enables us from the known possibility 
or impossibility of one of the congruences to determine if the other 
congruence is possible or not. But if one of the congruences is 
impossible, we can at once conclude that the equation 

x* =py- + qz 2 

is also impossible. 

This theorem, moreover, can be extended to congruences of the form 
x v = P (mod Q), 

where x, P, Q are algebraic numbers of the form 
a + b£ +■ c£ 2 + . . . , 

or even ideal numbers occurring in the theory of such algebraic 
numbers. The theorem in this case, known as the general law of 
reciprocity, was enunciated and proved by Kummer, but only a special 
case, due to Eisenstein, was required in Furtwangler’s* proof. Assuming 
this result, Furtwangler’s proof of the results 

2 P ~ 1 = 3 P_1 = 1 (mod jo 2 ) 

is very simple and natural. As already remarked, it seems that a new 
application of the laws of reciprocity may be expected to lead to 
interesting results. 

It appears probable that the 2 and 3 above can be replaced by any 
prime q (except p). In 1914, Vandiver t showed that 5 was another 
value for q, while FrobeniusJ showed that q might also take the values 
11 and 17 ; and also the values 7, 13, 19, if p = 5 (mod 6). 

The proofs, however, are very complicated and depend upon a special 
study of the properties of Bernoulli’s numbers. The elimination is 
carried out by taking the congruence 

- s 1(p- 0 < MO = 0 ( mod p \ 

multiplying throughout by an appropriate function of t, say./j(t), and 
then adding together the left-hand sides of the congruences, which 
then reduce practically to the form 

q p ~ 1 = 1 (mod jo 2 ), 

for the values q = 2, 3, 5 ... as just noted. 

* Sitzungs. Ak. Wiss. Wien (Math.), Vol. cxxi. 1912 n a, pp. 589 — 592. 
f Crelle’s Journal , Vol. cxliv. 1914, p. 314. 

X Sitzungs. Ak. Wiss. Berlin , 1914, p. 653. 


CHAPTER III 

LIBRI’S RESULT 

We shall now pass on to other methods* which have been employed. 
These, although of interest, do not prove the truth of Fermat’s Last 
Theorem for even one case. 

A simple method of attempting to prove the Theorem, which soon 
suggests itself to investigators, may be explained by taking the 
particular case 

a? + y 3 + = 0, 

which has already been considered. It follows from this equation that 
one of the unknowns must be divisible by 3 ; for otherwise each of 
them would be of the form 3n+l, and then their cubes would be 
of the form 27w 3 + 27m 2 + 9ra + 1, that is of the form 9jh±1. But 
obviously the sum of three numbers each of the form 9 m ± 1 cannot 
be zero, as this sum is not divisible by 9. Hence one of the unknowns 
must be divisible by 3. 

Similarly it can be shown that one of the unknowns must be divisible 
by 7. For it is easily shown that the cubes of all numbers not divisible 
by 7 are of the form 7 m± 1, so that the sum of the cubes of three 
numbers cannot be divisible by 7, and hence certainly not equal to 
zero, unless one of the numbers is divisible by 7. 

The question at once arises — Can an infinite number of primes q be 
found with the same property as the primes 3 and 7 above ; that is to 
say, from the fact that x 3 + y 3 + 2 s is divisible by q, does it follow that 
one of the unknowns must be divisible by q ? If so, the equation will 
be impossible, since one of the unknowns will be divisible by an infinite 
number of primes. A similar question suggests itself for the equation 

aP + y v + z p - 0. 

Libri in 1832 stated without proof that an infinite number of primes 
such as q did not exist. This was proved by Pellet about 1886, 
and independently in 1909 by Dickson and Hnrwitz amongst others. 
Dickson t also showed that 

cc“ + if ■ i z p 

Baehmann, Niedere Zahlentheorie, Vol. ii. Chapter ix. will be found useful 
in connection with the first and third chapters of this book. 

t Grelle's Journal , Vol. cxxxv. 1909, p. 181. Cf. also the paper by Hurwitz 
in Vol. cxxxvi. p. 272. A simple and elementary proof with rather larger limits for q 


SOPHIE GERMAIN’S RESULT 


27 


could be made divisible by q without any of the quantities x, y, z 
being divisible by q if 

q>(p- l) 2 (p - 2) 2 + 6p - 2. 

Hence the suggested method of attack cannot succeed in proving the 
truth of Fermat’s Last Theorem. 

Sophie Germain’s result 

Another line of attack depends upon some formulae discovered 
independently by a number of investigators, among whom may be 
mentioned Legendre, Abel and Peter Barlow, and developed by others 
such as Sophie Germain. It may be noted that Barlow was the first 
Englishman to write a treatise on the Theory of Numbers, and was 
also amongst the earliest writers who have given erroneous proofs of 
Fermat’s Last Theorem. 

Instead of x p + y p = z p , 

consider the more symmetrical form 

X P + t/ P + Z P = 0 ( 8 ), 

which can be written as 



We note now* that either z is not divisible by p, in which case the 
two factors on the right-hand side are prime to each other, or that z is 
divisible by p, in which case the two factors have a common factor p of 
which the first power only is contained in 

x v + y p 
x + y 

It follows now from symmetry, that if x, y and s are all prime to p, 
then 

v p + z p 

y + z a p , Z — - i p , x = - a£, 

y + z 

-p + 7 j> 

z + x = b p , — =rf, y = -by, 

z + x J ” 

X V +. yP 

x+ y =c ’ z= ~ c ^ 

from which 

2 x = b p +c p — a p , 2 y = c p + a p - b p , 2 : = a p + b 1 ' — <?. 

(prime or oomposite) was given by Sohnr in the Jahresber. d. Deutschen Math.- 
Vereinigung, Vol. xxv. 1916. 

* See p. 8. 


28 


sophie germain’s result 


If however one of the unknowns, say s, is divisible by p, the third 
of the above group of equations must be replaced by 

x + y =p p ~ 1 c p , — + ^ ? = p£ p , z = — cp(. 
x + y 

It is in the first case only, however, that important practical con 
sequences have been deduced. Suppose it is possible to find an odd 
prime q satisfying the two following conditions : firstly, that the 
congruence 

x v + if + z p = 0 (mod q ) 

requires that one of x, y, z must be divisible by q, and secondly that 
no integer k can be found satisfying 

k p - p = 0 (mod q) ; 

then the equation (8) cannot be satisfied by integers x, y and s each 
of which is prime to p. 

For since x p + y p + z p = 0, 

and is hence divisible by q, it follows that one of x, y, z must be 
divisible by q, say x. Now since 

2 x = b p + c p + (-af 

is divisible by q, one of a, b, c must be divisible by q. But b cannot be 
divisible by q, for then 

y --by 

would be divisible by q, contrary to the hypothesis that x and y are 
prime to each other. Similarly c cannot be divisible by q, so that a is 
divisible by q. Hence 

y + z = a n 

is divisible by q. As x is divisible by q and 



it follows that f : z l ’~' (mod q). 

Moreover from 

yP + ~ p 

£” = if- 1 - if~ 2 s ...+ z p ~ l , 

y + z 

and y + z = 0 (mod q), i.e. s = -y (mod q), 

it follows that 

£ p =pz v ~' (mod </), 

( p = pif (modi/). 


or 


wendt’s form of the result 


29 


But rj is prime to q since y = — by, so that we can find an integer k to 
satisfy 

f = krj (mod q), 

from which k v s p (mod q). 

But by hypothesis such an integer k does not exist. This proves then 
that equation (8) cannot be satisfied by integers each of which is 
prime to p. For example if p = 7, i.e. 

x 7 +y 7 + z 7 = 0, 

we can take q = 29. For the seventh powers of all numbers prime to 
29 leave remainders + 1, ±12 when divided by 29. Hence from 
x 7 + y 7 + ;'50 (mod 29), 

it follows that one of x, y, z must be divisible by 29. Also no number 
k can be found such that 

k 7 = 7 (mod 29). 

Hence the equation can only have solutions for which one of x, y, z is 
divisible by 7. 

The general theorem above is due to Sophie Germain, who gave 
the corresponding prime q for all odd primes/? less than 100. 

Wendt’s form of the result 

Sophie Germain’s general theorem was given in a slightly different 
form by Wendt* in 1895. He also gave the following necessary and 
sufficient conditions for the existence of a prime typified by q in 
the preceding chapter. Firstly q must be of the form 

q = 2 hp + 1 , 

where h is an integer prime to 3, for otherwise the congruence 
x p +y p + z v = 0 (mod q) 

would have solutions t for which x, y, z are all prime to q. 

Secondly q must not be a divisor of the determinant 



* Crelle's Journal , Vol. cxm. 1894, p. 335. 

t If h = Sn, we could take x = 1, y=g 2n , z = g 4n , where g is a primitive root (mod q). 


30 


wendt’s form of the result 


This second condition was found by considering the condition that 
the congruence 

x v + if + z v = 0 (mod q) (9) 

should have solutions in which no one of x, y, z is divisible by q. 

Putting in (9) x = itz, y = - vz, 

we have u p +l = v p (mod q) (io), 

where neither u nor v is divisible by q. Hence by a very well known 
theorem also due to Fermat, since 

q = : 2 hp + 1, 
u^= l(mod ? )) 

v a ' ,p = 1 (mod q) J ^ 

By eliminating u, v between the last three congruences, it can be 
shown that the necessary and sufficient condition that the congruence 
(9) can be satisfied by values of x, y, z each of which is prime to q is 
that q should be a divisor of the determinant D 2h . 

Wendt also replaced Sophie Germain’s condition 
k p £ p (mod q) 
by P* ^ 1 (mod q). 

From his determinant condition it would be extremely difficult to 
prove that a prime q can be found for a given value of p , especially as 
from Libri’s theorem it is known that there cannot be more than a finite 
number of values of q for a given prime p. It can however be shown 
that if any one of the numbers 


2/?+l, 4p + l, 8jo + 1, 16/? +1, 10/) -r 1, 14/? + 1 
is a prime, it can be taken as a value for q. Suppose for example that 

q = ' 2 p + 1 

is a prime. Then 

D - ^ _ q 

1 , 2 

so that q is prime to Z) 2 . The other condition 
P 2 £ 1 (mod q), 

that is (p + 1) (/) - 1) ^ 0 (mod ( 2 p + 1)), 

is also satisfied. Hence if 2 p + 1 is a prime, e.g. when p = 3, 5, 11 ... , 
the equation (8) cannot be satisfied if x, y and a are all prime to p. 
The three congruences (10) can be replaced by 
u 2hp = ( u p +1)“ = 1 (mod q), 


WENDTS FORM OF THE RESULT 


31 


or putting u p = w, 

iv* h = 1 (mod q) | 

(u>+ l) 2 * = 1 (mod q) J 

By a detailed study of these last two congruences for special values 
of h, Dickson* showed in 1908 that the equation 

x“ +y p + z p = 0 

had no solutions in which the unknowns were all prime to p if p < 7000. 
Maillet had previously in 1897 proved the truth of this for 100 <p < 223, 
while Mirimanoff in 1904 had raised the upper limit to 256t. 

* Quart. Journ. Math. Vol. xl. 1908. 

t More details of some of the results in this booklet are given in Bachmann’s 
book Z)«x Fermatproblem published in 1919. 


ISBN: 0-8218-2674-3 



780821 


826744 


CHEL/108.H 



flpURN To 


& 


^VURN To 


A 


\ 


% 


°i Nani-^ 




^vjrn ro 



